coba.learners
Learner implementations.
Dummy Learners
Select actions from a uniform distribution and learn nothing.
Select actions from a fixed distribution and learn nothing.
Bandit Learners
Select the greedy action with probability (1-epsilon).
Select the action with the highest upper confidence bound estimate.
Contextual Bandit Learners
A contextual bandit learner using upper confidence bounds to explore.
A contextual bandit learner using Thompson Sampling for exploration.
Epsilon-greedy exploration with a VW contextual bandit learner.
Softmax exploration with a VW contextual bandit learner.
Bootstrap aggregated policy exploration with a VW contextual bandit learner.
Online Cover exploration with a VW contextual bandit learner.
RegCB exploration with a VW contextual bandit learner.
RND exploration with a VW contextual bandit learner.
SquareCB exploration with a VW contextual bandit learner.
No exploration with a VW contextual bandit learner.
A contextual bandit learner that optimizes a collection of learners.
General Learners
A Vowpal Wabbit wrapper.