VowpalBagLearner

class coba.learners.VowpalBagLearner

Bootstrap aggregated policy exploration with a VW contextual bandit learner.

More information on VW exploration algorithms can be found here.

Constructors

__init__(bag: int = 5, features: Sequence[str] = (1, 'a', 'ax', 'axx'), seed: int | None = 1, **kwargs) → None

Instantiate a VowpalBagLearner.

Parameters:

bag – This value determines the number of policies which will be learned and must be greater than 0. Each policy is trained using bootstrap aggregation, making each policy unique. During prediction a random policy will be selected according to a uniform distribution and followed.
features – A list of namespaces and interactions to use when learning reward functions.
seed – The seed used by VW to generate any necessary random numbers.
kwargs – Additional key-word args are passed on as VW CLI arguments (unless removed in the function).

Methods

learn(context: Context, action: Action, reward: float, probability: float, actions: Actions | None = None) → None

Learn about the action taken in the context.

Parameters:

context – The context in which the action was taken.
action – The action that was taken.
reward – The reward for the given context and action (feedback for IGL problems).
probability – The probability the given action was taken.
**kwargs – Optional information returned during prediction.

predict(context: Context, actions: Actions) → Tuple[Action, Prob, Kwargs]

Predict which action to take in the context.

Parameters:

context – The current context. It will either be None (multi-armed bandit), a value (a single feature), a sequence of values (dense features), or a dictionary (sparse features).
actions – The current set of actions to choose from in the given context. Each action will either be a value (a single feature), a sequence of values (dense features), or a dictionary (sparse features).

Returns:

A Prediction. Several prediction formats are supported. See the type-hint for these.

score(context: Context, actions: Actions, action: Action) → Prob

Propensity score an action.

Parameters:

Returns:

The propensity score of the given action. That is, P(action|context,actions).

Attributes