VowpalLearner

class coba.learners.VowpalLearner

A Vowpal Wabbit wrapper.

Remarks:: This learner requires that the Vowpal Wabbit package be installed. This package can be installed via pip install vowpalwabbit. To learn more about solving contextual bandit problems with Vowpal Wabbit see here and here.

Constructors

__init__(args: str = '--cb_explore_adf --epsilon 0.05 --interactions ax --interactions axx --ignore_linear x --random_seed 1 --quiet', vw: VowpalMediator | None = None, seed: int = 1) → None

Instantiate a VowpalLearner.

Parameters:

args – Command line arguments to instantiate a Vowpal Wabbit contextual bandit learner. For examples and documentation on how to instantiate VW learners from command line arguments see here. We require that either cb, cb_adf, cb_explore, or cb_explore_adf is used. When we format examples for VW context features are placed in the ‘x’ namespace and action features, when relevant, are placed in the ‘a’ namespace.
vw – A mediator able to communicate with VW. This should not need to ever be changed.
seed – A seed for a random number generation.

Methods

finish(): Finish all pending work (e.g., write buffers to disk).

learn(context: Context, action: Action, reward: float, probability: float, actions: Actions | None = None) → None

Learn about the action taken in the context.

Parameters:

context – The context in which the action was taken.
action – The action that was taken.
reward – The reward for the given context and action (feedback for IGL problems).
probability – The probability the given action was taken.
**kwargs – Optional information returned during prediction.

predict(context: Context, actions: Actions) → Tuple[Action, Prob, Kwargs]

Predict which action to take in the context.

Parameters:

context – The current context. It will either be None (multi-armed bandit), a value (a single feature), a sequence of values (dense features), or a dictionary (sparse features).
actions – The current set of actions to choose from in the given context. Each action will either be a value (a single feature), a sequence of values (dense features), or a dictionary (sparse features).

Returns:

A Prediction. Several prediction formats are supported. See the type-hint for these.

score(context: Context, actions: Actions, action: Action) → Prob

Propensity score an action.

Parameters:

context – The current context.
actions – The current set of actions that can be chosen.
action – The action to propensity score.

Returns:

The propensity score of the given action. That is, P(action|context,actions).

Attributes

params