coba.learners

Learner implementations.

Dummy Learners

RandomLearner

Select actions from a uniform distribution and learn nothing.

FixedLearner

Select actions from a fixed distribution and learn nothing.

Bandit Learners

BanditEpsilonLearner

Select the greedy action with probability (1-epsilon).

BanditUCBLearner

Select the action with the highest upper confidence bound estimate.

Contextual Bandit Learners

LinUCBLearner

A contextual bandit learner using upper confidence bounds to explore.

LinTSLearner

A contextual bandit learner using Thompson Sampling for exploration.

VowpalEpsilonLearner

Epsilon-greedy exploration with a VW contextual bandit learner.

VowpalSoftmaxLearner

Softmax exploration with a VW contextual bandit learner.

VowpalBagLearner

Bootstrap aggregated policy exploration with a VW contextual bandit learner.

VowpalCoverLearner

Online Cover exploration with a VW contextual bandit learner.

VowpalRegcbLearner

RegCB exploration with a VW contextual bandit learner.

VowpalRndLearner

RND exploration with a VW contextual bandit learner.

VowpalSquarecbLearner

SquareCB exploration with a VW contextual bandit learner.

VowpalOffPolicyLearner

No exploration with a VW contextual bandit learner.

CorralLearner

A contextual bandit learner that optimizes a collection of learners.

General Learners

VowpalLearner

A Vowpal Wabbit wrapper.