To interact with this Notebook:

. To download this Notebook: click here.

Evaluators

Coba Evaluators evaluate the performance of Learners in Environments. They have the following interface.

class Evaluator:
    def evaluate(self, environment: Environment, learner: Learner) -> Iterable[dict]
        ...

Environments and Learners are described in more detail in Environments.ipynb and Learners.ipynb. In this notebook we…

…show how to create your own custom evaluators
…describe the Evaluators provided by Coba out-of-the-box

Custom Evaluator

Any class that conforms to the evaluator interface above can be used as an Evalutor in an Experiment. For example,

[2]:

class MyEvaluator:
    def evaluate(self, environment, learner):
        yield {'a':1, 'b':2}
        yield {'a':2, 'b':3}

We can then pass our custom Evaluator to Experiment in order to generate results.

[3]:

import coba as cb

env = cb.Environments.from_linear_synthetic(100)
lrn = cb.RandomLearner()
val = MyEvaluator()

result = cb.Experiment(env,lrn,val).run()

2024-02-02 14:37:33 -- Experiment Started
2024-02-02 14:37:33 -- Recording Learner 0 parameters... (0.0 seconds) (completed)
2024-02-02 14:37:33 -- Recording Evaluator 0 parameters... (0.0 seconds) (completed)
2024-02-02 14:37:33 -- Peeking at Environment 0... (0.0 seconds) (completed)
2024-02-02 14:37:33 -- Recording Environment 0 parameters... (0.0 seconds) (completed)
2024-02-02 14:37:33 -- Evaluating Learner 0 on Environment 0... (0.0 seconds) (completed)
2024-02-02 14:37:33 -- Experiment Finished

When we look at the result.evaluators we can see that MyEvaluator was used.

[4]:

result.evaluators.to_pandas()

[4]:

	evaluator_id	eval_type
0	0	MyEvaluator

We can also see that the return values from MyEvaluator were stored in result.interactions.

[5]:

result.interactions.to_pandas()

[5]:

	environment_id	learner_id	evaluator_id	index	a	b
0	0	0	0	1	1	2
1	0	0	0	2	2	3

We can even create a custom plot using the interactions produced by MyEvaluator.

[6]:

result.plot_learners(x='a',y='b')

../_images/notebooks_Evaluators_10_0.png

If more than one Environment/Learner is passed to Experiment our custom Evalautor will be called for all combinations.

[9]:

env = cb.Environments.from_linear_synthetic(100,seed=[1,2])
lrn = [cb.RandomLearner(seed=1),cb.RandomLearner(seed=2)]
val = MyEvaluator()
cb.Experiment(env,lrn,val).run(quiet=True).interactions.to_pandas()

[9]:

	environment_id	learner_id	index	a	b
0	0	0	1	1	2
1	0	0	2	2	3
2	0	1	1	1	2
3	0	1	2	2	3
4	1	0	1	1	2
5	1	0	2	2	3
6	1	1	1	1	2
7	1	1	2	2	3

To see how a custom Evaluator might work in practice we will now create an online learning Evaluator.

[1]:

import coba as cb

class MyOnlineEvaluator:
    def evaluate(self, environment, learner):
        for interaction in environment.read():
            action,prob = learner.predict(interaction['context'],interaction['actions'])
            reward      = interaction['rewards'](action)
            learner.learn(interaction['context'],action,reward,prob)
            yield {'reward': reward }

env = cb.Environments.from_linear_synthetic(200,seed=[1,2])
lrn = [cb.RandomLearner(seed=1),cb.LinUCBLearner()]
val = MyOnlineEvaluator()

cb.Experiment(env,lrn,val).run(quiet=True).plot_learners()

../_images/notebooks_Evaluators_14_0.png

Built-in Evaluators

Coba comes with three built-in evaluators:

SequentialCB – sequential on/off-policy learning/evaluation (default evaluator)
RejectionCB – sequential on-policy learning/evaluation using logged data
SequentialIGL – sequential interaction grounded learning/evaluation

On-policy Evaluation without Learning

[2]:

import coba as cb
env = cb.Environments.from_linear_synthetic(100,seed=[1,2])
lrn = [cb.RandomLearner(),cb.LinUCBLearner()]
val = cb.SequentialCB(learn=None)
cb.Experiment(env,lrn,val).run(quiet=True).plot_learners()

../_images/notebooks_Evaluators_17_0.png

On-policy Learning without Evaluation

[10]:

import coba as cb
env = cb.Environments.from_linear_synthetic(200, n_context_features=0, n_action_features=2, seed=1)
lrn = cb.LinUCBLearner(features=[1,'a'])
val = cb.SequentialCB(eval=None)

print("Feature weights have not been learned")
print("-------------------------------------")
print(lrn._theta)

cb.Experiment(env,lrn,val).run(quiet=True)

print()
print("Feature weights have been learned")
print("---------------------------------")
print(lrn._theta)

Feature weights have not been learned
-------------------------------------
None

Feature weights have been learned
---------------------------------
[ 0.47452277  0.48165662 -0.09730614]

Off-policy Learning using Reward Estimates

[54]:

import coba as cb
env = cb.Environments.from_linear_synthetic(200, n_context_features=0, n_action_features=2, seed=1).logged(cb.RandomLearner())
lrn = cb.LinUCBLearner(features=[1,'a'])
val = cb.SequentialCB(learn='dm',eval='on')
cb.Experiment(env,lrn,val).run(quiet=True).plot_learners(xlim=(3,None))

../_images/notebooks_Evaluators_21_0.png

Off-policy Learning with On-Policy Evaluation

[55]:

import coba as cb
env = cb.Environments.from_linear_synthetic(200, n_context_features=0, n_action_features=2, seed=1).logged(cb.RandomLearner())
lrn = cb.LinUCBLearner(features=[1,'a'])
val = cb.SequentialCB(learn='off',eval='on')
cb.Experiment(env,lrn,val).run(quiet=True).plot_learners(xlim=(3,None))

../_images/notebooks_Evaluators_23_0.png

Off-policy Learning with Off-Policy Evaluation

[3]:

import coba as cb
env = cb.Environments.from_linear_synthetic(200, n_context_features=0, n_action_features=2, seed=[1,2,3]).logged(cb.RandomLearner())
lrn = cb.LinUCBLearner(features=[1,'a'])
val = cb.SequentialCB(learn='off',eval='ips')
cb.Experiment(env,lrn,val).run(quiet=True).plot_learners(xlim=(3,None))

../_images/notebooks_Evaluators_25_0.png