SequentialCB

class coba.evaluators.SequentialCB

Sequential evaluation for CB learners.

Constructors

__init__(record: Sequence[Literal['reward', 'time', 'probability', 'action', 'context', 'actions', 'rewards']] = ['reward', 'action', 'probability'], learn: Literal['on', 'off', 'ips', 'dr', 'dm'] | None = 'on', eval: Literal['on', 'ips', 'dr', 'dm'] | None = 'on', seed: float | None = None) → None

Instantiate a SequentialCB evaluator.

Parameters:

record – Variables to record for each learner interaction.
learn – action and reward revealed to learner on learn call: on — on-policy action/reward (requires ‘actions’ and ‘rewards’), off — off-policy action/reward (requires ‘action’ and ‘reward’), ips — on-policy action/ips-reward (requires ‘actions’, ‘action’, ‘reward’, and ‘probability’), dr — on-policy action/dr-reward (requires ‘actions’, ‘action’, and ‘reward’), dm — on-policy action/dm-reward (requires ‘actions’, ‘action’, and ‘reward’), None — learn is not called.
eval – reward recorded given learner prediction: on — actual reward (requires ‘actions’ and ‘rewards’), ips — ips-reward (requires ‘actions’, ‘action’, ‘reward’, and ‘probability’), dr — dr-reward (requires ‘actions’, ‘action’, and ‘reward’), dm — dm-reward (requires ‘actions’, ‘action’, and ‘reward’), None — no reward is recorded.
seed – Determine which action is played when learners return an action PMF.

Methods

evaluate(environment: Environment | None, learner: Learner | None) → Iterable[Mapping[Any, Any]]

Evaluate the learner on the given interactions.

Parameters:

environment – The Environment we want to evaluate against.
learner – The Learner that we wish to evaluate.

Returns:

Evaluation results

Attributes

params