VowpalCoverLearner

class coba.learners.VowpalCoverLearner

Online Cover exploration with a VW contextual bandit learner.

For more information on this algorithm see Agarwal et al. (2014) and here.

References

Agarwal, Alekh, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. “Taming the monster: A fast and simple algorithm for contextual bandits.” In International Conference on Machine Learning, pp. 1638-1646. 2014.

Constructors

__init__(cover: int = 5, features: Sequence[str] = (1, 'a', 'ax', 'axx'), seed: int | None = 1, **kwargs) → None

Instantiate a VowpalCoverLearner.

Parameters:

cover – The number of policies which will be learned (must be greater than 0).
features – A list of namespaces and interactions to use when learning reward functions.
seed – The seed used by VW to generate any necessary random numbers.
kwargs – Additional key-word args are passed on as VW CLI arguments (unless removed in the function).

Methods

finish(): Finish all pending work (e.g., write buffers to disk).

learn(context: Context, action: Action, reward: float, probability: float, actions: Actions | None = None) → None

Learn about the action taken in the context.

Parameters:

context – The context in which the action was taken.
action – The action that was taken.
reward – The reward for the given context and action (feedback for IGL problems).
probability – The probability the given action was taken.
**kwargs – Optional information returned during prediction.

predict(context: Context, actions: Actions) → Tuple[Action, Prob, Kwargs]

Predict which action to take in the context.

Parameters:

context – The current context. It will either be None (multi-armed bandit), a value (a single feature), a sequence of values (dense features), or a dictionary (sparse features).
actions – The current set of actions to choose from in the given context. Each action will either be a value (a single feature), a sequence of values (dense features), or a dictionary (sparse features).

Returns:

A Prediction. Several prediction formats are supported. See the type-hint for these.

score(context: Context, actions: Actions, action: Action) → Prob

Propensity score an action.

Parameters:

context – The current context.
actions – The current set of actions that can be chosen.
action – The action to propensity score.

Returns:

The propensity score of the given action. That is, P(action|context,actions).

Attributes

params