To interact with this Notebook: Binder badge. To download this Notebook: click here.

Results

A Result is the outcome of an Experiment. It has several features to support easy analysis.

In this notebook we will learn about Result

  1. …Creating: How to create a Result from an Experiment

  2. …Plotting: How to create plots from a Result

  3. …Loading: How to load Result from disk

  4. …Selecting: How to select which Result data to analyze

  5. …Pandas: How to turn Result into Panda Dataframes

Creating

Results are created by running an Experiment. Below is a simple example of this:

[1]:
import coba as cb

environments = cb.Environments.from_linear_synthetic(2000).shuffle([1,2,3])
learners     = [cb.RandomLearner(), cb.VowpalEpsilonLearner()]
results      = cb.Experiment(environments,learners).run()
2024-02-08 14:32:16 -- Experiment Started
2024-02-08 14:32:16 -- Recording Learner 0 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:16 -- Recording Evaluator 0 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:16 -- Recording Learner 1 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:16 -- Peeking at Environment 0... (0.32 seconds) (completed)
2024-02-08 14:32:16 -- Recording Environment 0 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:17 -- Evaluating Learner 0 on Environment 0... (0.37 seconds) (completed)
2024-02-08 14:32:18 -- Evaluating Learner 1 on Environment 0... (0.87 seconds) (completed)
2024-02-08 14:32:18 -- Peeking at Environment 1... (0.31 seconds) (completed)
2024-02-08 14:32:18 -- Recording Environment 1 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:18 -- Evaluating Learner 0 on Environment 1... (0.34 seconds) (completed)
2024-02-08 14:32:19 -- Evaluating Learner 1 on Environment 1... (0.72 seconds) (completed)
2024-02-08 14:32:19 -- Peeking at Environment 2... (0.32 seconds) (completed)
2024-02-08 14:32:19 -- Recording Environment 2 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:20 -- Evaluating Learner 0 on Environment 2... (0.36 seconds) (completed)
2024-02-08 14:32:20 -- Evaluating Learner 1 on Environment 2... (0.71 seconds) (completed)
2024-02-08 14:32:20 -- Experiment Finished

Plotting

With a Result class one can plot the performance of learners. This is a great way to quickly get a sense of the experiment.

Below we plot the result of our experiment above.

[2]:
results.plot_learners()
../_images/notebooks_Results_3_0.png

The plot_learners method has many arguments which allow customization of the plot. Here are just a few:

xlim: Define the x-axis limits to plot. If `None` the x-axis limits will be inferred.
ylim: Define the y-axis limits to plot. If `None` the y-axis limits will be inferred.
span: The number of previous rewards to average together. When span is none we plot the rolling average.
err: Determine what kind of error bars to plot (if any). Valid types are `None`, 'se', and 'sd'. If `None`
    then no bars are plotted, if 'se' the standard error is shown, and if 'sd' the standard deviation is shown.
ax: Provide an optional axes that the plot will be drawn to. If not provided a new figure/axes is created.
[3]:
results.plot_learners(xlim=(500,1000), span=1 , err=None)
results.plot_learners(xlim=(500,1000), span=40, err='se')
../_images/notebooks_Results_5_0.png
../_images/notebooks_Results_5_1.png

Loading

Results can also loaded from an Experiment restoration file.

[13]:
import coba as cb

env = cb.Environments.from_linear_synthetic(2000).shuffle([1,2,3])
lrn = [cb.RandomLearner(), cb.VowpalEpsilonLearner(epsilon=0.1)]

cb.Experiment(env,lrn).run('results.log',quiet=True)
cb.Result.from_file('results.log').plot_learners()
../_images/notebooks_Results_7_0.png

Selecting

Often times we only want to plot certain learners or environments. Below are two examples of when and how this can be done.

Selecting Finished Environments

One particularly useful feature is looking at the results of an experiment while it runs (this can be done by loading an a Result from experiment logs while the experiment is running). When doing this it is useful to select just the Environments that have been fully processed. If we didn’t some learners may look better or worse simply because of missing or extra data. Selecting down to only finished environments looks like this.

[14]:
cb.Result.from_file('results.log').where_fin().plot_learners()
../_images/notebooks_Results_10_0.png

Selecting In General

When running very large experiments it can often be useful to select down to specific learners or environments to answer questions. This can be done using the where method. Below is an example where we select down to just the vw learner. When selecting on strings Coba will perform a regex match so this can be very powerful.

[15]:
cb.Result.from_file('results.log').where(family='vw').plot_learners()
../_images/notebooks_Results_12_0.png

To Pandas

Finally, often times the analysis that we wish to do simply isn’t possible with the built in Result methods. In this case we can turn the various pieces of Result into panda dataframes. There are three items within Result which can be turned into dataframes. Below we show all three:

[16]:
result = cb.Result.from_file('results.log')
result.learners.to_pandas()
[16]:
learner_id args epsilon family features seed
0 0 None NaN random None 1
1 1 --cb_explore_adf --epsilon 0.1 --random_seed 1... 0.1 vw [1, a, ax, axx] 1
[17]:
result = cb.Result.from_file('results.log')
result.environments.to_pandas()
[17]:
environment_id env_type n_actions n_coeff reward_features seed shuffle_seed
0 0 LinearSynthetic 5 5 [a, xa] 1 1
1 1 LinearSynthetic 5 5 [a, xa] 1 2
2 2 LinearSynthetic 5 5 [a, xa] 1 3
[18]:
result = cb.Result.from_file('results.log')
result.interactions.to_pandas()
[18]:
environment_id learner_id evaluator_id index action probability reward
0 0 0 0 1 [0.72139, 0.17695, -0.94002, 0.08659, 0.83188] 0.20 0.79412
1 0 0 0 2 [0.16227, 0.59735, -0.85493, -0.11122, 0.53707] 0.20 0.43476
2 0 0 0 3 [0.5655, 0.53303, -0.76695, 0.4355, -0.90318] 0.20 0.53500
3 0 0 0 4 [0.40408, 0.4963, -0.15722, 0.6081, 0.40259] 0.20 0.65024
4 0 0 0 5 [-0.12778, 0.8441, 0.51425, 0.47563, -0.42304] 0.20 0.56094
... ... ... ... ... ... ... ...
11995 2 1 0 1996 [0.0595, -0.36016, -0.26017, 0.89185, -0.35685] 0.92 0.65614
11996 2 1 0 1997 [0.94362, 0.4796, -0.97441, 0.76965, -0.84778] 0.92 0.72093
11997 2 1 0 1998 [-0.29204, 0.29273, 0.21691, 0.94181, -0.76658] 0.92 0.54302
11998 2 1 0 1999 [0.13377, 0.85283, -0.58861, 0.79819, 0.79701] 0.92 0.75122
11999 2 1 0 2000 [-0.15262, 0.77369, -0.82678, -0.30494, -0.97439] 0.92 0.78285

12000 rows × 7 columns

Advanced

Coba result objects containa number of useful features to make quick analysis easier. Below we cover:

  1. Results with Missing Evaluations

  2. Results with Evaluations of different Lengths

  3. Plotting with Respect to Custom Variables

  4. Overlaying Result Plots

  5. Plotting Direct Comparisons

  6. Adjusting the Moving Average

  7. Determining Confidence Intervals

  8. Applying Where Filters to Results

  9. Post Hoc Hyperparameter Tuning

1. Results with Missing Evaluations

Here we create a result that is missing any interactions evaluating env1 on lrn1:

[19]:
import coba as cb

env0 = cb.Environments.from_linear_synthetic(100,seed=1)[0]
env1 = cb.Environments.from_linear_synthetic(100,seed=2)[0]
lrn0 = cb.RandomLearner()
lrn1 = cb.RandomLearner()

result = cb.Experiment([(env0,lrn0),(env1,lrn0),(env0,lrn1)]).run(quiet=True)

After creating our result we can print it to see that it has all of our learners and environemnts

[20]:
print(result)
{'Learners': 2, 'Environments': 2, 'Interactions': 300}

However, when we ask it to plot we receive a notification in addition to the plot:

We removed 1 environment_id because it did not exist for every full_name.

[21]:
result.plot_learners()
We removed 1 environment_id because it did not exist for every full_name.
../_images/notebooks_Results_22_1.png

When we ask Coba to plot a result it checks and makes sure all plotted learners have the same environments completed. If this is not the case then coba removes the environments that aren’t completed by all learners before plotting. We can disable this functionality by setting p=None. We see below that even though the two learners are identical we now get different plotted results due to learner 0 showing the average performance across two environments.

[22]:
result.plot_learners(p=None)
../_images/notebooks_Results_24_0.png

Alternatively, we could also remove learner 1 from result to see learner 0 performance on environments.

Removing learner 1 means that all learners (now only learner 0) are completed for all environments (environment 0 and 1).

[23]:
result.where(learner_id=0).plot_learners()
../_images/notebooks_Results_26_0.png

Finally, we can filter to equally completed evaluations by using where_fin(l,p) (short for where_finished).

Coba will remove all p from the result where one l does not exist for every p.

The variable p is short hand for pairings. That is, on what variables p do we want to “pair” l.

[24]:
result = result.where_fin(l='learner_id',p='environment_id')
We removed 1 environment_id because it did not exist for every learner_id.

2. Results with Evaluations of Different Lengths

Here we create a result with one environment made up of 100 interactions and one environment with 200 interactions:

[25]:
import coba as cb

env0 = cb.Environments.from_linear_synthetic(100,seed=1)[0]
env1 = cb.Environments.from_linear_synthetic(200,seed=2)[0]
lrn0 = cb.RandomLearner()
lrn1 = cb.RandomLearner()

result = cb.Experiment([env0,env1],[lrn0,lrn1]).run(quiet=True)

After creating our result we can print it to see that it has all of our learners and environemnts and interactions

[26]:
print(result)
{'Learners': 2, 'Environments': 2, 'Interactions': 600}

When we ask Coba to plot we receive a notification in addition to the plot:

We shortened 2 learner evaluations because they were longer than the shortest environment

[27]:
result.plot_learners()
We shortened 2 learner evaluations because they were longer than the shortest environment.
../_images/notebooks_Results_34_1.png

This is similar to the missing environment use case except now we have indexes that only exist in specific environments.

Once again we can force this constraint to be ignored by setting p=None.

When we do this we see a sudden and drastic change at x=100 due to the data from 100-200 only coming from a single environment.

[28]:
result.plot_learners(p=None)
../_images/notebooks_Results_36_0.png

A better way to fix this problem is to either explicitly remove short environments or use where_fin(n_interactions).

[29]:
result.where(environment_id=1).plot_learners()
result.where_fin(200).plot_learners()
../_images/notebooks_Results_38_0.png
We removed 2 learner evaluations because they were shorter than 200 interactions.
../_images/notebooks_Results_38_2.png

When using where_fin(n_interactions) coba will remove environments that don’t contain at least n_interactions and drop interactions from environments with more than n_interactions. Notice below that after applying where_fin(150) our result now only has 300 interactions (i.e., 150 from each of the environment 1 evaluations).

[30]:
result.where_fin(150)
We removed 2 learner evaluations because they were shorter than 150 interactions.
We shortened 2 learner evaluations because they were longer than the shortest environment.
{'Learners': 2, 'Environments': 1, 'Interactions': 300}

3. Plotting with Respect to Custom Variables

Often we are interested in performance with respect to specific variables.

Here we create two environments with different numbers of actions so we can see how a learner performs with respect to action count.

[31]:
import coba as cb

env0 = cb.Environments.from_linear_synthetic(100,n_actions=2,seed=1).binary().shuffle(n=5)
env1 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=2).binary().shuffle(n=5)
lrn0 = cb.RandomLearner()

result = cb.Experiment(env0+env1,lrn0).run(quiet=True)

We can look at the environments table to see what variables we could plot on and we see n_actions.

[32]:
result.environments.to_pandas()
[32]:
environment_id binary env_type n_actions n_coeff reward_features seed shuffle_seed
0 0 True LinearSynthetic 2 5 [a, xa] 1 0
1 1 True LinearSynthetic 2 5 [a, xa] 1 1
2 2 True LinearSynthetic 2 5 [a, xa] 1 2
3 3 True LinearSynthetic 2 5 [a, xa] 1 3
4 4 True LinearSynthetic 2 5 [a, xa] 1 4
5 5 True LinearSynthetic 5 5 [a, xa] 2 0
6 6 True LinearSynthetic 5 5 [a, xa] 2 1
7 7 True LinearSynthetic 5 5 [a, xa] 2 2
8 8 True LinearSynthetic 5 5 [a, xa] 2 3
9 9 True LinearSynthetic 5 5 [a, xa] 2 4

Based on this we can set the plot’s label to l='n_actions' and set p=None so that coba doesn’t try to remove missing environments

[33]:
result.plot_learners(l='n_actions',p=None,err='se')
../_images/notebooks_Results_46_0.png

We could set the x-axis to n_actions as well which collapses down to the right-most reward value

[34]:
result.plot_learners(x='n_actions',l='n_actions',p=None,err='se')
../_images/notebooks_Results_48_0.png

It also possible to create plots with respect to collections of variables acrossenvironments, learners, evaluators and interactions.

[36]:
result.plot_learners(x=['n_actions','shuffle_seed'],l='n_actions', p= None,err='se')
../_images/notebooks_Results_50_0.png

4. Overlaying Result Plots

It is possible to overlay multiple plots generated by the same result or separate results. Here we make a quick result.

[38]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary().shuffle(n=16)
lrn0 = cb.RandomLearner()
result = cb.Experiment(env0,lrn0).run(quiet=True)

To overlay plots we tell coba to not output the plot by passing in plot_learners(out=None).

Once we’ve done that we can call result again to generate a second plot that we overlay.

Below we overlay the standard error on top of the standard deviation around the mean.

[40]:
result.plot_learners(l='learner_id',p= None, colors="green", labels=''    , err='sd', out=None)
result.plot_learners(l='learner_id',p= None, colors="red"  , labels='Mean', err='se', out=None)
../_images/notebooks_Results_55_0.png

And here we overlay a learner’s overall mean performance on top of the underlying evaluations.

[42]:
result.plot_learners(l='shuffle_seed',p= None, colors="#ADD8E6", labels=''    , alpha=.3, out=None)
result.plot_learners(l='learner_id'  ,p= None, colors="#00008B", labels='Mean'                    )
../_images/notebooks_Results_57_0.png

5. Plotting Direct Comparisons

To know the best performer between two learners we can use plot_contrast. This will plot the difference of learners on environments.

[43]:
import matplotlib.pyplot as plt
import coba as cb

env0 = cb.Environments.from_linear_synthetic(200,n_actions=2,seed=1).binary().shuffle(n=5)
env1 = cb.Environments.from_linear_synthetic(200,n_actions=2,seed=2).binary().shuffle(n=5)
env2 = cb.Environments.from_linear_synthetic(200,n_actions=2,seed=3).binary().shuffle(n=5)

lrn0 = cb.RandomLearner()
lrn1 = cb.VowpalEpsilonLearner()

result = cb.Experiment(env0+env1+env2,[lrn0,lrn1]).run(quiet=True)

Below we show the same results in two different ways. The first we show the average reward for each learner on each environment seed.

[44]:
result.plot_learners(x='seed',err='se',out=None)
plt.xticks([1,2,3],[1,2,3])
plt.show()
../_images/notebooks_Results_61_0.png

For the second we show the average difference in performance. Notice that the point estimates below are the same as the distance between the blue and orange point estimates above. This plot is similar to a collection of paired hypothesis tests. When the confidence intervals around the point estimate don’t contain 0 we can say that one learner outperformed the other with statistical significance. In this case The VowpalEpsilonLearner outperforms random selection on all datasets.

[45]:
learner_id1 = 0
learner_id2 = 1
result.plot_contrast(learner_id1,learner_id2,l='learner_id',x='seed',xorder='+',err='se',out=None)
plt.xticks([1,2,3],[1,2,3])
plt.show()
../_images/notebooks_Results_63_0.png

We can also contrast by custom variables. Here we contrast a learner’s performance with respect to pre-conditioning.

[1]:
import coba as cb

env0 = cb.Environments.from_openml(180).reservoir(4000,seeds=range(10))
env1 = env0.scale('min'   ,'minmax')
env2 = env0.scale('mean'  ,'std'   )
env3 = env0.scale('median','iqr'   )

envs  = env0 #no scaling
envs += env1 #scales features to [0,1]                    (aka, sklearn MinMaxScaler)
envs += env2 #scales features to approximately [-std,std] (aka, sklearn StandardScaler)
envs += env3 #scales features to approximately [-iqr,iqr] (aka, sklearn RobustScaler)

lrn = cb.VowpalEpsilonLearner()

result = cb.Experiment(envs,lrn).run(processes=8,quiet=True)

To determine the variable we want to contrast on we set l. We then the specific l values to contrast in l1 and l2.

We set p below so that plot_contrast knows we want to diff reward(l1,openml_data,reservoir_seed) - reward(l2,openml_data,reservoir_seed).

The conclusion we can draw from below is that the standard scaling and robust scaling give a small performance boost over no scaling.

[2]:
result.plot_contrast(l1=None,l2=['minmax','std','iqr'],l='scale',x='scale',p=['openml_data','reservoir_seed'],mode='diff',err='bs')
../_images/notebooks_Results_67_0.png

And here is the plot of the absolute values (i.e., rewards before differencing).

[3]:
result.plot_learners(x='scale',p=None,err='bs',legend=False)
../_images/notebooks_Results_69_0.png

Alternatively, we could also plot the probability that l2 > l1 using a binomial confidence interval.

[4]:
result.plot_contrast(l1=None,l2=['minmax','std','iqr'],l='scale',x='scale',p=['openml_data','reservoir_seed'],mode='prob',err='bi')
../_images/notebooks_Results_71_0.png

6. Adjusting the Moving Average

We can adjust the number of preceeding interactions to use when averaging y via the span parameter. By default coba plots the total average.

[5]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary().shuffle(n=16)
lrn0 = cb.RandomLearner()
result = cb.Experiment(env0,lrn0).run(quiet=True)

This overlays different window sizes for calculating the moving average.

[6]:
result.plot_learners(span=3    ,out=None,labels='prev 3'  ,colors=1)
result.plot_learners(span=9    ,out=None,labels='prev 9'  ,colors=3)
result.plot_learners(span=27   ,out=None,labels='prev 27' ,colors=4)
result.plot_learners(span=None,out=None,labels='all time' ,colors=0, title='')
../_images/notebooks_Results_75_0.png

7. Determining Confidence Intervals

To calculate confidence intervals in Result plots we require multiple observations per point.

For example, in the code below no confidence intervals can be calculated because there is only one environment per learner.

[7]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary()
lrns = [cb.RandomLearner(seed=1),cb.RandomLearner(seed=2)]
cb.Experiment(env0,lrns).run(quiet=True).plot_learners(err='se')
../_images/notebooks_Results_78_0.png

To generate multiple environments we can simply shuffle the given environment

[8]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary().shuffle(n=10)
lrns = [cb.RandomLearner(seed=1),cb.RandomLearner(seed=2)]
cb.Experiment(env0,lrns).run(quiet=True).plot_learners(err='bs')
../_images/notebooks_Results_80_0.png

Coba supports four types of confidence intervals natively 1. Std Deviation (err='std') 2. Std Error (err='se') 3. Bootstrap (err='bs') 4. Binomial (err='bi')

Some of these can be customized by instantiating its esimator directly as shown below where we increase the confidence level from the default of .95 to .99

[9]:
import coba as cb

env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary().shuffle(n=10)
lrns = [cb.RandomLearner(seed=1),cb.RandomLearner(seed=2)]

cb.Experiment(env0,lrns).run(quiet=True).plot_learners(err=cb.BootstrapCI(.99,cb.mean))
../_images/notebooks_Results_82_0.png

8. Applying Where Filters to Results

Sometimes we want to downselect an experiment Result. We can use the where method for this. First we create a result.

[10]:
import coba as cb
env0   = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=[0,1,2,3,4])
lrn0   = cb.RandomLearner()
result = cb.Experiment(env0,lrn0).run(quiet=True)

We can downselect to the first three environments in three different ways.

Notice that there are only three environments after the where is executed.

[11]:
print(result                                        )
print(result.where(environment_id=[0,1,2]          ))
print(result.where(environment_id={'in':[0,1,2]}   ))
print(result.where(environment_id={'<=':2}         ))
print(result.where(environment_id={'match':'[012]'}))
{'Learners': 1, 'Environments': 5, 'Interactions': 500}
{'Learners': 1, 'Environments': 3, 'Interactions': 300}
{'Learners': 1, 'Environments': 3, 'Interactions': 300}
{'Learners': 1, 'Environments': 3, 'Interactions': 300}
{'Learners': 1, 'Environments': 3, 'Interactions': 300}

We can select on any variable in environments, learners, evaluators, and interactions. Below we select using seed.

[12]:
result.environments.to_pandas()
[12]:
environment_id env_type n_actions n_coeff reward_features seed
0 0 LinearSynthetic 5 5 [a, xa] 0
1 1 LinearSynthetic 5 5 [a, xa] 1
2 2 LinearSynthetic 5 5 [a, xa] 2
3 3 LinearSynthetic 5 5 [a, xa] 3
4 4 LinearSynthetic 5 5 [a, xa] 4
[13]:
result.where(seed=1).environments.to_pandas()
[13]:
environment_id env_type n_actions n_coeff reward_features seed
0 1 LinearSynthetic 5 5 [a, xa] 1

9. Post Hoc Hyperparameter Tuning

The result plots also supports post-hoc hyperparameter tuning. It is post-hoc in the sense that all hyperparameters must be fully evaluated in the initial experiment. The tuning only happens via selection after the fact. This gives one an idea of how well a learner could have performed had appropriate tuning been applied up-front.

[14]:
import matplotlib.pyplot as plt
import coba as cb

env0 = cb.Environments.from_linear_synthetic(1000,n_actions=3,seed=1).binary().shuffle(n=5)
env1 = cb.Environments.from_mlp_synthetic(1000,n_actions=3,seed=2).binary().shuffle(n=5)

lrn1 = cb.VowpalEpsilonLearner(epsilon=0.01,features=['a','ax'])
lrn2 = cb.VowpalEpsilonLearner(epsilon=0.05,features=['a','ax','axx'])

result = cb.Experiment(env0+env1,[lrn1,lrn2]).run(quiet=True)

We see that the linear VW learner does very well on the linear environment while the non-linear VW learner does better on the non-linear environment

[15]:
result.plot_learners(x='env_type')
../_images/notebooks_Results_93_0.png

Using where_best filter we can post-hoc keep only the best learner in each family for each environment type

[16]:
result.where_best(l='family',p='env_type',n=500).plot_learners(x='env_type',p=None)
../_images/notebooks_Results_95_0.png

We can overlay the average performance of best VW per environment over both underlying VW learners.

The green line is the average performance had we used the linear VW learner on the linear environment and the non-linear VW learner on the non-linear environment.

[17]:
result.plot_learners(out=None,labels=['VW linear','VW non-linear'])
result.where_best(l='family',p='env_type',n=500).plot_learners(l='family',colors='green',labels='VW only best')
../_images/notebooks_Results_97_0.png