To interact with this Notebook:

. To download this Notebook: click here.

Results

A Result is the outcome of an Experiment. It has several features to support easy analysis.

In this notebook we will learn about Result…

…Creating: How to create a Result from an Experiment
…Plotting: How to create plots from a Result
…Loading: How to load Result from disk
…Selecting: How to select which Result data to analyze
…Pandas: How to turn Result into Panda Dataframes

Creating

Results are created by running an Experiment. Below is a simple example of this:

[1]:

import coba as cb

environments = cb.Environments.from_linear_synthetic(2000).shuffle([1,2,3])
learners     = [cb.RandomLearner(), cb.VowpalEpsilonLearner()]
results      = cb.Experiment(environments,learners).run()

2024-02-08 14:32:16 -- Experiment Started
2024-02-08 14:32:16 -- Recording Learner 0 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:16 -- Recording Evaluator 0 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:16 -- Recording Learner 1 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:16 -- Peeking at Environment 0... (0.32 seconds) (completed)
2024-02-08 14:32:16 -- Recording Environment 0 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:17 -- Evaluating Learner 0 on Environment 0... (0.37 seconds) (completed)
2024-02-08 14:32:18 -- Evaluating Learner 1 on Environment 0... (0.87 seconds) (completed)
2024-02-08 14:32:18 -- Peeking at Environment 1... (0.31 seconds) (completed)
2024-02-08 14:32:18 -- Recording Environment 1 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:18 -- Evaluating Learner 0 on Environment 1... (0.34 seconds) (completed)
2024-02-08 14:32:19 -- Evaluating Learner 1 on Environment 1... (0.72 seconds) (completed)
2024-02-08 14:32:19 -- Peeking at Environment 2... (0.32 seconds) (completed)
2024-02-08 14:32:19 -- Recording Environment 2 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:20 -- Evaluating Learner 0 on Environment 2... (0.36 seconds) (completed)
2024-02-08 14:32:20 -- Evaluating Learner 1 on Environment 2... (0.71 seconds) (completed)
2024-02-08 14:32:20 -- Experiment Finished

Plotting

With a Result class one can plot the performance of learners. This is a great way to quickly get a sense of the experiment.

Below we plot the result of our experiment above.

[2]:

results.plot_learners()

The plot_learners method has many arguments which allow customization of the plot. Here are just a few:

xlim: Define the x-axis limits to plot. If `None` the x-axis limits will be inferred.
ylim: Define the y-axis limits to plot. If `None` the y-axis limits will be inferred.
span: The number of previous rewards to average together. When span is none we plot the rolling average.
err: Determine what kind of error bars to plot (if any). Valid types are `None`, 'se', and 'sd'. If `None`
    then no bars are plotted, if 'se' the standard error is shown, and if 'sd' the standard deviation is shown.
ax: Provide an optional axes that the plot will be drawn to. If not provided a new figure/axes is created.

[3]:

results.plot_learners(xlim=(500,1000), span=1 , err=None)
results.plot_learners(xlim=(500,1000), span=40, err='se')

Loading

Results can also loaded from an Experiment restoration file.

[13]:

import coba as cb

env = cb.Environments.from_linear_synthetic(2000).shuffle([1,2,3])
lrn = [cb.RandomLearner(), cb.VowpalEpsilonLearner(epsilon=0.1)]

cb.Experiment(env,lrn).run('results.log',quiet=True)
cb.Result.from_file('results.log').plot_learners()

Selecting

Often times we only want to plot certain learners or environments. Below are two examples of when and how this can be done.

Selecting Finished Environments

One particularly useful feature is looking at the results of an experiment while it runs (this can be done by loading an a Result from experiment logs while the experiment is running). When doing this it is useful to select just the Environments that have been fully processed. If we didn’t some learners may look better or worse simply because of missing or extra data. Selecting down to only finished environments looks like this.

[14]:

cb.Result.from_file('results.log').where_fin().plot_learners()

Selecting In General

When running very large experiments it can often be useful to select down to specific learners or environments to answer questions. This can be done using the where method. Below is an example where we select down to just the vw learner. When selecting on strings Coba will perform a regex match so this can be very powerful.

[15]:

cb.Result.from_file('results.log').where(family='vw').plot_learners()

To Pandas

Finally, often times the analysis that we wish to do simply isn’t possible with the built in Result methods. In this case we can turn the various pieces of Result into panda dataframes. There are three items within Result which can be turned into dataframes. Below we show all three:

[16]:

result = cb.Result.from_file('results.log')
result.learners.to_pandas()

[16]:

	learner_id	args	epsilon	family	features	seed
0	0	None	NaN	random	None	1
1	1	--cb_explore_adf --epsilon 0.1 --random_seed 1...	0.1	vw	[1, a, ax, axx]	1

[17]:

result = cb.Result.from_file('results.log')
result.environments.to_pandas()

[17]:

	environment_id	env_type	n_actions	n_coeff	reward_features	seed	shuffle_seed
0	0	LinearSynthetic	5	5	[a, xa]	1	1
1	1	LinearSynthetic	5	5	[a, xa]	1	2
2	2	LinearSynthetic	5	5	[a, xa]	1	3

[18]:

result = cb.Result.from_file('results.log')
result.interactions.to_pandas()

[18]:

	environment_id	learner_id	evaluator_id	index	action	probability	reward
0	0	0	0	1	[0.72139, 0.17695, -0.94002, 0.08659, 0.83188]	0.20	0.79412
1	0	0	0	2	[0.16227, 0.59735, -0.85493, -0.11122, 0.53707]	0.20	0.43476
2	0	0	0	3	[0.5655, 0.53303, -0.76695, 0.4355, -0.90318]	0.20	0.53500
3	0	0	0	4	[0.40408, 0.4963, -0.15722, 0.6081, 0.40259]	0.20	0.65024
4	0	0	0	5	[-0.12778, 0.8441, 0.51425, 0.47563, -0.42304]	0.20	0.56094
...	...	...	...	...	...	...	...
11995	2	1	0	1996	[0.0595, -0.36016, -0.26017, 0.89185, -0.35685]	0.92	0.65614
11996	2	1	0	1997	[0.94362, 0.4796, -0.97441, 0.76965, -0.84778]	0.92	0.72093
11997	2	1	0	1998	[-0.29204, 0.29273, 0.21691, 0.94181, -0.76658]	0.92	0.54302
11998	2	1	0	1999	[0.13377, 0.85283, -0.58861, 0.79819, 0.79701]	0.92	0.75122
11999	2	1	0	2000	[-0.15262, 0.77369, -0.82678, -0.30494, -0.97439]	0.92	0.78285

12000 rows × 7 columns

Advanced

Coba result objects containa number of useful features to make quick analysis easier. Below we cover:

Results with Missing Evaluations
Results with Evaluations of different Lengths
Plotting with Respect to Custom Variables
Overlaying Result Plots
Plotting Direct Comparisons
Adjusting the Moving Average
Determining Confidence Intervals
Applying Where Filters to Results
Post Hoc Hyperparameter Tuning

1. Results with Missing Evaluations

Here we create a result that is missing any interactions evaluating env1 on lrn1:

[19]:

import coba as cb

env0 = cb.Environments.from_linear_synthetic(100,seed=1)[0]
env1 = cb.Environments.from_linear_synthetic(100,seed=2)[0]
lrn0 = cb.RandomLearner()
lrn1 = cb.RandomLearner()

result = cb.Experiment([(env0,lrn0),(env1,lrn0),(env0,lrn1)]).run(quiet=True)

After creating our result we can print it to see that it has all of our learners and environemnts

[20]:

print(result)

{'Learners': 2, 'Environments': 2, 'Interactions': 300}

However, when we ask it to plot we receive a notification in addition to the plot:

We removed 1 environment_id because it did not exist for every full_name.

[21]:

result.plot_learners()

We removed 1 environment_id because it did not exist for every full_name.

When we ask Coba to plot a result it checks and makes sure all plotted learners have the same environments completed. If this is not the case then coba removes the environments that aren’t completed by all learners before plotting. We can disable this functionality by setting p=None. We see below that even though the two learners are identical we now get different plotted results due to learner 0 showing the average performance across two environments.

[22]:

result.plot_learners(p=None)

Alternatively, we could also remove learner 1 from result to see learner 0 performance on environments.

Removing learner 1 means that all learners (now only learner 0) are completed for all environments (environment 0 and 1).

[23]:

result.where(learner_id=0).plot_learners()

Finally, we can filter to equally completed evaluations by using where_fin(l,p) (short for where_finished).

Coba will remove all p from the result where one l does not exist for every p.

The variable p is short hand for pairings. That is, on what variables p do we want to “pair” l.

[24]:

result = result.where_fin(l='learner_id',p='environment_id')

We removed 1 environment_id because it did not exist for every learner_id.

2. Results with Evaluations of Different Lengths

Here we create a result with one environment made up of 100 interactions and one environment with 200 interactions:

[25]:

import coba as cb

env0 = cb.Environments.from_linear_synthetic(100,seed=1)[0]
env1 = cb.Environments.from_linear_synthetic(200,seed=2)[0]
lrn0 = cb.RandomLearner()
lrn1 = cb.RandomLearner()

result = cb.Experiment([env0,env1],[lrn0,lrn1]).run(quiet=True)

After creating our result we can print it to see that it has all of our learners and environemnts and interactions

[26]:

print(result)

{'Learners': 2, 'Environments': 2, 'Interactions': 600}

When we ask Coba to plot we receive a notification in addition to the plot:

We shortened 2 learner evaluations because they were longer than the shortest environment

[27]:

result.plot_learners()

We shortened 2 learner evaluations because they were longer than the shortest environment.

This is similar to the missing environment use case except now we have indexes that only exist in specific environments.

Once again we can force this constraint to be ignored by setting p=None.

When we do this we see a sudden and drastic change at x=100 due to the data from 100-200 only coming from a single environment.

[28]:

result.plot_learners(p=None)

A better way to fix this problem is to either explicitly remove short environments or use where_fin(n_interactions).

[29]:

result.where(environment_id=1).plot_learners()
result.where_fin(200).plot_learners()

We removed 2 learner evaluations because they were shorter than 200 interactions.

When using where_fin(n_interactions) coba will remove environments that don’t contain at least n_interactions and drop interactions from environments with more than n_interactions. Notice below that after applying where_fin(150) our result now only has 300 interactions (i.e., 150 from each of the environment 1 evaluations).

[30]:

result.where_fin(150)

We removed 2 learner evaluations because they were shorter than 150 interactions.
We shortened 2 learner evaluations because they were longer than the shortest environment.
{'Learners': 2, 'Environments': 1, 'Interactions': 300}

3. Plotting with Respect to Custom Variables

Often we are interested in performance with respect to specific variables.

Here we create two environments with different numbers of actions so we can see how a learner performs with respect to action count.

[31]:

import coba as cb

env0 = cb.Environments.from_linear_synthetic(100,n_actions=2,seed=1).binary().shuffle(n=5)
env1 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=2).binary().shuffle(n=5)
lrn0 = cb.RandomLearner()

result = cb.Experiment(env0+env1,lrn0).run(quiet=True)

We can look at the environments table to see what variables we could plot on and we see n_actions.

[32]:

result.environments.to_pandas()

[32]:

	environment_id	binary	env_type	n_actions	n_coeff	reward_features	seed	shuffle_seed
0	0	True	LinearSynthetic	2	5	[a, xa]	1	0
1	1	True	LinearSynthetic	2	5	[a, xa]	1	1
2	2	True	LinearSynthetic	2	5	[a, xa]	1	2
3	3	True	LinearSynthetic	2	5	[a, xa]	1	3
4	4	True	LinearSynthetic	2	5	[a, xa]	1	4
5	5	True	LinearSynthetic	5	5	[a, xa]	2	0
6	6	True	LinearSynthetic	5	5	[a, xa]	2	1
7	7	True	LinearSynthetic	5	5	[a, xa]	2	2
8	8	True	LinearSynthetic	5	5	[a, xa]	2	3
9	9	True	LinearSynthetic	5	5	[a, xa]	2	4

Based on this we can set the plot’s label to l='n_actions' and set p=None so that coba doesn’t try to remove missing environments

[33]:

result.plot_learners(l='n_actions',p=None,err='se')

We could set the x-axis to n_actions as well which collapses down to the right-most reward value

[34]:

result.plot_learners(x='n_actions',l='n_actions',p=None,err='se')

It also possible to create plots with respect to collections of variables acrossenvironments, learners, evaluators and interactions.

[36]:

result.plot_learners(x=['n_actions','shuffle_seed'],l='n_actions', p= None,err='se')

4. Overlaying Result Plots

It is possible to overlay multiple plots generated by the same result or separate results. Here we make a quick result.

[38]:

import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary().shuffle(n=16)
lrn0 = cb.RandomLearner()
result = cb.Experiment(env0,lrn0).run(quiet=True)

To overlay plots we tell coba to not output the plot by passing in plot_learners(out=None).

Once we’ve done that we can call result again to generate a second plot that we overlay.

Below we overlay the standard error on top of the standard deviation around the mean.

[40]:

result.plot_learners(l='learner_id',p= None, colors="green", labels=''    , err='sd', out=None)
result.plot_learners(l='learner_id',p= None, colors="red"  , labels='Mean', err='se', out=None)

And here we overlay a learner’s overall mean performance on top of the underlying evaluations.

[42]:

result.plot_learners(l='shuffle_seed',p= None, colors="#ADD8E6", labels=''    , alpha=.3, out=None)
result.plot_learners(l='learner_id'  ,p= None, colors="#00008B", labels='Mean'                    )

5. Plotting Direct Comparisons

To know the best performer between two learners we can use plot_contrast. This will plot the difference of learners on environments.

[43]:

import matplotlib.pyplot as plt
import coba as cb

env0 = cb.Environments.from_linear_synthetic(200,n_actions=2,seed=1).binary().shuffle(n=5)
env1 = cb.Environments.from_linear_synthetic(200,n_actions=2,seed=2).binary().shuffle(n=5)
env2 = cb.Environments.from_linear_synthetic(200,n_actions=2,seed=3).binary().shuffle(n=5)

lrn0 = cb.RandomLearner()
lrn1 = cb.VowpalEpsilonLearner()

result = cb.Experiment(env0+env1+env2,[lrn0,lrn1]).run(quiet=True)

Below we show the same results in two different ways. The first we show the average reward for each learner on each environment seed.

[44]:

result.plot_learners(x='seed',err='se',out=None)
plt.xticks([1,2,3],[1,2,3])
plt.show()

For the second we show the average difference in performance. Notice that the point estimates below are the same as the distance between the blue and orange point estimates above. This plot is similar to a collection of paired hypothesis tests. When the confidence intervals around the point estimate don’t contain 0 we can say that one learner outperformed the other with statistical significance. In this case The VowpalEpsilonLearner outperforms random selection on all datasets.

[45]:

learner_id1 = 0
learner_id2 = 1
result.plot_contrast(learner_id1,learner_id2,l='learner_id',x='seed',xorder='+',err='se',out=None)
plt.xticks([1,2,3],[1,2,3])
plt.show()

We can also contrast by custom variables. Here we contrast a learner’s performance with respect to pre-conditioning.

[1]:

import coba as cb

env0 = cb.Environments.from_openml(180).reservoir(4000,seeds=range(10))
env1 = env0.scale('min'   ,'minmax')
env2 = env0.scale('mean'  ,'std'   )
env3 = env0.scale('median','iqr'   )

envs  = env0 #no scaling
envs += env1 #scales features to [0,1]                    (aka, sklearn MinMaxScaler)
envs += env2 #scales features to approximately [-std,std] (aka, sklearn StandardScaler)
envs += env3 #scales features to approximately [-iqr,iqr] (aka, sklearn RobustScaler)

lrn = cb.VowpalEpsilonLearner()

result = cb.Experiment(envs,lrn).run(processes=8,quiet=True)

To determine the variable we want to contrast on we set l. We then the specific l values to contrast in l1 and l2.

We set p below so that plot_contrast knows we want to diff reward(l1,openml_data,reservoir_seed) - reward(l2,openml_data,reservoir_seed).

The conclusion we can draw from below is that the standard scaling and robust scaling give a small performance boost over no scaling.

[2]:

result.plot_contrast(l1=None,l2=['minmax','std','iqr'],l='scale',x='scale',p=['openml_data','reservoir_seed'],mode='diff',err='bs')

And here is the plot of the absolute values (i.e., rewards before differencing).

[3]:

result.plot_learners(x='scale',p=None,err='bs',legend=False)

Alternatively, we could also plot the probability that l2 > l1 using a binomial confidence interval.

[4]:

result.plot_contrast(l1=None,l2=['minmax','std','iqr'],l='scale',x='scale',p=['openml_data','reservoir_seed'],mode='prob',err='bi')

6. Adjusting the Moving Average

We can adjust the number of preceeding interactions to use when averaging y via the span parameter. By default coba plots the total average.

[5]:

import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary().shuffle(n=16)
lrn0 = cb.RandomLearner()
result = cb.Experiment(env0,lrn0).run(quiet=True)

This overlays different window sizes for calculating the moving average.

[6]:

result.plot_learners(span=3    ,out=None,labels='prev 3'  ,colors=1)
result.plot_learners(span=9    ,out=None,labels='prev 9'  ,colors=3)
result.plot_learners(span=27   ,out=None,labels='prev 27' ,colors=4)
result.plot_learners(span=None,out=None,labels='all time' ,colors=0, title='')

7. Determining Confidence Intervals

To calculate confidence intervals in Result plots we require multiple observations per point.

For example, in the code below no confidence intervals can be calculated because there is only one environment per learner.

[7]:

import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary()
lrns = [cb.RandomLearner(seed=1),cb.RandomLearner(seed=2)]
cb.Experiment(env0,lrns).run(quiet=True).plot_learners(err='se')

To generate multiple environments we can simply shuffle the given environment

[8]:

import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary().shuffle(n=10)
lrns = [cb.RandomLearner(seed=1),cb.RandomLearner(seed=2)]
cb.Experiment(env0,lrns).run(quiet=True).plot_learners(err='bs')

Coba supports four types of confidence intervals natively 1. Std Deviation (err='std') 2. Std Error (err='se') 3. Bootstrap (err='bs') 4. Binomial (err='bi')

Some of these can be customized by instantiating its esimator directly as shown below where we increase the confidence level from the default of .95 to .99

[9]:

import coba as cb

env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary().shuffle(n=10)
lrns = [cb.RandomLearner(seed=1),cb.RandomLearner(seed=2)]

cb.Experiment(env0,lrns).run(quiet=True).plot_learners(err=cb.BootstrapCI(.99,cb.mean))

8. Applying Where Filters to Results

Sometimes we want to downselect an experiment Result. We can use the where method for this. First we create a result.

[10]:

import coba as cb
env0   = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=[0,1,2,3,4])
lrn0   = cb.RandomLearner()
result = cb.Experiment(env0,lrn0).run(quiet=True)

We can downselect to the first three environments in three different ways.

Notice that there are only three environments after the where is executed.

[11]:

print(result                                        )
print(result.where(environment_id=[0,1,2]          ))
print(result.where(environment_id={'in':[0,1,2]}   ))
print(result.where(environment_id={'<=':2}         ))
print(result.where(environment_id={'match':'[012]'}))

{'Learners': 1, 'Environments': 5, 'Interactions': 500}
{'Learners': 1, 'Environments': 3, 'Interactions': 300}
{'Learners': 1, 'Environments': 3, 'Interactions': 300}
{'Learners': 1, 'Environments': 3, 'Interactions': 300}
{'Learners': 1, 'Environments': 3, 'Interactions': 300}

We can select on any variable in environments, learners, evaluators, and interactions. Below we select using seed.

[12]:

result.environments.to_pandas()

[12]:

	environment_id	env_type	n_actions	n_coeff	reward_features	seed
0	0	LinearSynthetic	5	5	[a, xa]	0
1	1	LinearSynthetic	5	5	[a, xa]	1
2	2	LinearSynthetic	5	5	[a, xa]	2
3	3	LinearSynthetic	5	5	[a, xa]	3
4	4	LinearSynthetic	5	5	[a, xa]	4

[13]:

result.where(seed=1).environments.to_pandas()

[13]:

	environment_id	env_type	n_actions	n_coeff	reward_features	seed
0	1	LinearSynthetic	5	5	[a, xa]	1

9. Post Hoc Hyperparameter Tuning

The result plots also supports post-hoc hyperparameter tuning. It is post-hoc in the sense that all hyperparameters must be fully evaluated in the initial experiment. The tuning only happens via selection after the fact. This gives one an idea of how well a learner could have performed had appropriate tuning been applied up-front.

[14]:

import matplotlib.pyplot as plt
import coba as cb

env0 = cb.Environments.from_linear_synthetic(1000,n_actions=3,seed=1).binary().shuffle(n=5)
env1 = cb.Environments.from_mlp_synthetic(1000,n_actions=3,seed=2).binary().shuffle(n=5)

lrn1 = cb.VowpalEpsilonLearner(epsilon=0.01,features=['a','ax'])
lrn2 = cb.VowpalEpsilonLearner(epsilon=0.05,features=['a','ax','axx'])

result = cb.Experiment(env0+env1,[lrn1,lrn2]).run(quiet=True)

We see that the linear VW learner does very well on the linear environment while the non-linear VW learner does better on the non-linear environment

[15]:

result.plot_learners(x='env_type')

Using where_best filter we can post-hoc keep only the best learner in each family for each environment type

[16]:

result.where_best(l='family',p='env_type',n=500).plot_learners(x='env_type',p=None)

We can overlay the average performance of best VW per environment over both underlying VW learners.

The green line is the average performance had we used the linear VW learner on the linear environment and the non-linear VW learner on the non-linear environment.

[17]:

result.plot_learners(out=None,labels=['VW linear','VW non-linear'])
result.where_best(l='family',p='env_type',n=500).plot_learners(l='family',colors='green',labels='VW only best')