SequentialCB
- class coba.evaluators.SequentialCB
Sequential evaluation for CB learners.
Constructors
- __init__(record: Sequence[Literal['reward', 'time', 'probability', 'action', 'context', 'actions', 'rewards']] = ['reward', 'action', 'probability'], learn: Literal['on', 'off', 'ips', 'dr', 'dm'] | None = 'on', eval: Literal['on', 'ips', 'dr', 'dm'] | None = 'on', seed: float | None = None) None
Instantiate a SequentialCB evaluator.
- Parameters:
record – Variables to record for each learner interaction.
learn – action and reward revealed to learner on learn call: on — on-policy action/reward (requires ‘actions’ and ‘rewards’), off — off-policy action/reward (requires ‘action’ and ‘reward’), ips — on-policy action/ips-reward (requires ‘actions’, ‘action’, ‘reward’, and ‘probability’), dr — on-policy action/dr-reward (requires ‘actions’, ‘action’, and ‘reward’), dm — on-policy action/dm-reward (requires ‘actions’, ‘action’, and ‘reward’), None — learn is not called.
eval – reward recorded given learner prediction: on — actual reward (requires ‘actions’ and ‘rewards’), ips — ips-reward (requires ‘actions’, ‘action’, ‘reward’, and ‘probability’), dr — dr-reward (requires ‘actions’, ‘action’, and ‘reward’), dm — dm-reward (requires ‘actions’, ‘action’, and ‘reward’), None — no reward is recorded.
seed – Determine which action is played when learners return an action PMF.
Methods
- evaluate(environment: Environment | None, learner: Learner | None) Iterable[Mapping[Any, Any]]
Evaluate the learner on the given interactions.
- Parameters:
environment – The Environment we want to evaluate against.
learner – The Learner that we wish to evaluate.
- Returns:
Evaluation results
Attributes
- params