Results
A Result is the outcome of an Experiment. It has several features to support easy analysis.
In this notebook we will learn about Result…
…Creating: How to create a Result from an Experiment
…Plotting: How to create plots from a Result
…Loading: How to load Result from disk
…Selecting: How to select which Result data to analyze
…Pandas: How to turn Result into Panda Dataframes
Creating
Results are created by running an Experiment. Below is a simple example of this:
[1]:
import coba as cb
environments = cb.Environments.from_linear_synthetic(2000).shuffle([1,2,3])
learners = [cb.RandomLearner(), cb.VowpalEpsilonLearner()]
results = cb.Experiment(environments,learners).run()
2024-02-08 14:32:16 -- Experiment Started
2024-02-08 14:32:16 -- Recording Learner 0 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:16 -- Recording Evaluator 0 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:16 -- Recording Learner 1 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:16 -- Peeking at Environment 0... (0.32 seconds) (completed)
2024-02-08 14:32:16 -- Recording Environment 0 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:17 -- Evaluating Learner 0 on Environment 0... (0.37 seconds) (completed)
2024-02-08 14:32:18 -- Evaluating Learner 1 on Environment 0... (0.87 seconds) (completed)
2024-02-08 14:32:18 -- Peeking at Environment 1... (0.31 seconds) (completed)
2024-02-08 14:32:18 -- Recording Environment 1 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:18 -- Evaluating Learner 0 on Environment 1... (0.34 seconds) (completed)
2024-02-08 14:32:19 -- Evaluating Learner 1 on Environment 1... (0.72 seconds) (completed)
2024-02-08 14:32:19 -- Peeking at Environment 2... (0.32 seconds) (completed)
2024-02-08 14:32:19 -- Recording Environment 2 parameters... (0.0 seconds) (completed)
2024-02-08 14:32:20 -- Evaluating Learner 0 on Environment 2... (0.36 seconds) (completed)
2024-02-08 14:32:20 -- Evaluating Learner 1 on Environment 2... (0.71 seconds) (completed)
2024-02-08 14:32:20 -- Experiment Finished
Plotting
With a Result class one can plot the performance of learners. This is a great way to quickly get a sense of the experiment.
Below we plot the result of our experiment above.
[2]:
results.plot_learners()
The plot_learners
method has many arguments which allow customization of the plot. Here are just a few:
xlim: Define the x-axis limits to plot. If `None` the x-axis limits will be inferred.
ylim: Define the y-axis limits to plot. If `None` the y-axis limits will be inferred.
span: The number of previous rewards to average together. When span is none we plot the rolling average.
err: Determine what kind of error bars to plot (if any). Valid types are `None`, 'se', and 'sd'. If `None`
then no bars are plotted, if 'se' the standard error is shown, and if 'sd' the standard deviation is shown.
ax: Provide an optional axes that the plot will be drawn to. If not provided a new figure/axes is created.
[3]:
results.plot_learners(xlim=(500,1000), span=1 , err=None)
results.plot_learners(xlim=(500,1000), span=40, err='se')
Loading
Results can also loaded from an Experiment restoration file.
[13]:
import coba as cb
env = cb.Environments.from_linear_synthetic(2000).shuffle([1,2,3])
lrn = [cb.RandomLearner(), cb.VowpalEpsilonLearner(epsilon=0.1)]
cb.Experiment(env,lrn).run('results.log',quiet=True)
cb.Result.from_file('results.log').plot_learners()
Selecting
Often times we only want to plot certain learners or environments. Below are two examples of when and how this can be done.
Selecting Finished Environments
One particularly useful feature is looking at the results of an experiment while it runs (this can be done by loading an a Result from experiment logs while the experiment is running). When doing this it is useful to select just the Environments that have been fully processed. If we didn’t some learners may look better or worse simply because of missing or extra data. Selecting down to only finished environments looks like this.
[14]:
cb.Result.from_file('results.log').where_fin().plot_learners()
Selecting In General
When running very large experiments it can often be useful to select down to specific learners or environments to answer questions. This can be done using the where
method. Below is an example where we select down to just the vw learner. When selecting on strings Coba will perform a regex match so this can be very powerful.
[15]:
cb.Result.from_file('results.log').where(family='vw').plot_learners()
To Pandas
Finally, often times the analysis that we wish to do simply isn’t possible with the built in Result methods. In this case we can turn the various pieces of Result into panda dataframes. There are three items within Result which can be turned into dataframes. Below we show all three:
[16]:
result = cb.Result.from_file('results.log')
result.learners.to_pandas()
[16]:
learner_id | args | epsilon | family | features | seed | |
---|---|---|---|---|---|---|
0 | 0 | None | NaN | random | None | 1 |
1 | 1 | --cb_explore_adf --epsilon 0.1 --random_seed 1... | 0.1 | vw | [1, a, ax, axx] | 1 |
[17]:
result = cb.Result.from_file('results.log')
result.environments.to_pandas()
[17]:
environment_id | env_type | n_actions | n_coeff | reward_features | seed | shuffle_seed | |
---|---|---|---|---|---|---|---|
0 | 0 | LinearSynthetic | 5 | 5 | [a, xa] | 1 | 1 |
1 | 1 | LinearSynthetic | 5 | 5 | [a, xa] | 1 | 2 |
2 | 2 | LinearSynthetic | 5 | 5 | [a, xa] | 1 | 3 |
[18]:
result = cb.Result.from_file('results.log')
result.interactions.to_pandas()
[18]:
environment_id | learner_id | evaluator_id | index | action | probability | reward | |
---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 1 | [0.72139, 0.17695, -0.94002, 0.08659, 0.83188] | 0.20 | 0.79412 |
1 | 0 | 0 | 0 | 2 | [0.16227, 0.59735, -0.85493, -0.11122, 0.53707] | 0.20 | 0.43476 |
2 | 0 | 0 | 0 | 3 | [0.5655, 0.53303, -0.76695, 0.4355, -0.90318] | 0.20 | 0.53500 |
3 | 0 | 0 | 0 | 4 | [0.40408, 0.4963, -0.15722, 0.6081, 0.40259] | 0.20 | 0.65024 |
4 | 0 | 0 | 0 | 5 | [-0.12778, 0.8441, 0.51425, 0.47563, -0.42304] | 0.20 | 0.56094 |
... | ... | ... | ... | ... | ... | ... | ... |
11995 | 2 | 1 | 0 | 1996 | [0.0595, -0.36016, -0.26017, 0.89185, -0.35685] | 0.92 | 0.65614 |
11996 | 2 | 1 | 0 | 1997 | [0.94362, 0.4796, -0.97441, 0.76965, -0.84778] | 0.92 | 0.72093 |
11997 | 2 | 1 | 0 | 1998 | [-0.29204, 0.29273, 0.21691, 0.94181, -0.76658] | 0.92 | 0.54302 |
11998 | 2 | 1 | 0 | 1999 | [0.13377, 0.85283, -0.58861, 0.79819, 0.79701] | 0.92 | 0.75122 |
11999 | 2 | 1 | 0 | 2000 | [-0.15262, 0.77369, -0.82678, -0.30494, -0.97439] | 0.92 | 0.78285 |
12000 rows × 7 columns
Advanced
Coba result objects containa number of useful features to make quick analysis easier. Below we cover:
Results with Missing Evaluations
Results with Evaluations of different Lengths
Plotting with Respect to Custom Variables
Overlaying Result Plots
Plotting Direct Comparisons
Adjusting the Moving Average
Determining Confidence Intervals
Applying Where Filters to Results
Post Hoc Hyperparameter Tuning
1. Results with Missing Evaluations
Here we create a result that is missing any interactions evaluating env1 on lrn1:
[19]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,seed=1)[0]
env1 = cb.Environments.from_linear_synthetic(100,seed=2)[0]
lrn0 = cb.RandomLearner()
lrn1 = cb.RandomLearner()
result = cb.Experiment([(env0,lrn0),(env1,lrn0),(env0,lrn1)]).run(quiet=True)
After creating our result we can print it to see that it has all of our learners and environemnts
[20]:
print(result)
{'Learners': 2, 'Environments': 2, 'Interactions': 300}
However, when we ask it to plot we receive a notification in addition to the plot:
We removed 1 environment_id because it did not exist for every full_name.
[21]:
result.plot_learners()
We removed 1 environment_id because it did not exist for every full_name.
When we ask Coba to plot a result it checks and makes sure all plotted learners have the same environments completed. If this is not the case then coba removes the environments that aren’t completed by all learners before plotting. We can disable this functionality by setting p=None
. We see below that even though the two learners are identical we now get different plotted results due to learner 0 showing the average performance across two environments.
[22]:
result.plot_learners(p=None)
Alternatively, we could also remove learner 1 from result to see learner 0 performance on environments.
Removing learner 1 means that all learners (now only learner 0) are completed for all environments (environment 0 and 1).
[23]:
result.where(learner_id=0).plot_learners()
Finally, we can filter to equally completed evaluations by using where_fin(l,p)
(short for where_finished).
Coba will remove all p
from the result where one l
does not exist for every p
.
The variable p
is short hand for pairings. That is, on what variables p
do we want to “pair” l
.
[24]:
result = result.where_fin(l='learner_id',p='environment_id')
We removed 1 environment_id because it did not exist for every learner_id.
2. Results with Evaluations of Different Lengths
Here we create a result with one environment made up of 100 interactions and one environment with 200 interactions:
[25]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,seed=1)[0]
env1 = cb.Environments.from_linear_synthetic(200,seed=2)[0]
lrn0 = cb.RandomLearner()
lrn1 = cb.RandomLearner()
result = cb.Experiment([env0,env1],[lrn0,lrn1]).run(quiet=True)
After creating our result we can print it to see that it has all of our learners and environemnts and interactions
[26]:
print(result)
{'Learners': 2, 'Environments': 2, 'Interactions': 600}
When we ask Coba to plot we receive a notification in addition to the plot:
We shortened 2 learner evaluations because they were longer than the shortest environment
[27]:
result.plot_learners()
We shortened 2 learner evaluations because they were longer than the shortest environment.
This is similar to the missing environment use case except now we have indexes that only exist in specific environments.
Once again we can force this constraint to be ignored by setting p=None
.
When we do this we see a sudden and drastic change at x=100 due to the data from 100-200 only coming from a single environment.
[28]:
result.plot_learners(p=None)
A better way to fix this problem is to either explicitly remove short environments or use where_fin(n_interactions)
.
[29]:
result.where(environment_id=1).plot_learners()
result.where_fin(200).plot_learners()
We removed 2 learner evaluations because they were shorter than 200 interactions.
When using where_fin(n_interactions)
coba will remove environments that don’t contain at least n_interactions
and drop interactions from environments with more than n_interactions
. Notice below that after applying where_fin(150)
our result now only has 300 interactions (i.e., 150 from each of the environment 1 evaluations).
[30]:
result.where_fin(150)
We removed 2 learner evaluations because they were shorter than 150 interactions.
We shortened 2 learner evaluations because they were longer than the shortest environment.
{'Learners': 2, 'Environments': 1, 'Interactions': 300}
3. Plotting with Respect to Custom Variables
Often we are interested in performance with respect to specific variables.
Here we create two environments with different numbers of actions so we can see how a learner performs with respect to action count.
[31]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=2,seed=1).binary().shuffle(n=5)
env1 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=2).binary().shuffle(n=5)
lrn0 = cb.RandomLearner()
result = cb.Experiment(env0+env1,lrn0).run(quiet=True)
We can look at the environments table to see what variables we could plot on and we see n_actions
.
[32]:
result.environments.to_pandas()
[32]:
environment_id | binary | env_type | n_actions | n_coeff | reward_features | seed | shuffle_seed | |
---|---|---|---|---|---|---|---|---|
0 | 0 | True | LinearSynthetic | 2 | 5 | [a, xa] | 1 | 0 |
1 | 1 | True | LinearSynthetic | 2 | 5 | [a, xa] | 1 | 1 |
2 | 2 | True | LinearSynthetic | 2 | 5 | [a, xa] | 1 | 2 |
3 | 3 | True | LinearSynthetic | 2 | 5 | [a, xa] | 1 | 3 |
4 | 4 | True | LinearSynthetic | 2 | 5 | [a, xa] | 1 | 4 |
5 | 5 | True | LinearSynthetic | 5 | 5 | [a, xa] | 2 | 0 |
6 | 6 | True | LinearSynthetic | 5 | 5 | [a, xa] | 2 | 1 |
7 | 7 | True | LinearSynthetic | 5 | 5 | [a, xa] | 2 | 2 |
8 | 8 | True | LinearSynthetic | 5 | 5 | [a, xa] | 2 | 3 |
9 | 9 | True | LinearSynthetic | 5 | 5 | [a, xa] | 2 | 4 |
Based on this we can set the plot’s label to l='n_actions'
and set p=None
so that coba doesn’t try to remove missing environments
[33]:
result.plot_learners(l='n_actions',p=None,err='se')
We could set the x-axis to n_actions
as well which collapses down to the right-most reward value
[34]:
result.plot_learners(x='n_actions',l='n_actions',p=None,err='se')
It also possible to create plots with respect to collections of variables acrossenvironments
, learners
, evaluators
and interactions
.
[36]:
result.plot_learners(x=['n_actions','shuffle_seed'],l='n_actions', p= None,err='se')
4. Overlaying Result Plots
It is possible to overlay multiple plots generated by the same result or separate results. Here we make a quick result.
[38]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary().shuffle(n=16)
lrn0 = cb.RandomLearner()
result = cb.Experiment(env0,lrn0).run(quiet=True)
To overlay plots we tell coba to not output the plot by passing in plot_learners(out=None)
.
Once we’ve done that we can call result again to generate a second plot that we overlay.
Below we overlay the standard error on top of the standard deviation around the mean.
[40]:
result.plot_learners(l='learner_id',p= None, colors="green", labels='' , err='sd', out=None)
result.plot_learners(l='learner_id',p= None, colors="red" , labels='Mean', err='se', out=None)
And here we overlay a learner’s overall mean performance on top of the underlying evaluations.
[42]:
result.plot_learners(l='shuffle_seed',p= None, colors="#ADD8E6", labels='' , alpha=.3, out=None)
result.plot_learners(l='learner_id' ,p= None, colors="#00008B", labels='Mean' )
5. Plotting Direct Comparisons
To know the best performer between two learners we can use plot_contrast
. This will plot the difference of learners on environments.
[43]:
import matplotlib.pyplot as plt
import coba as cb
env0 = cb.Environments.from_linear_synthetic(200,n_actions=2,seed=1).binary().shuffle(n=5)
env1 = cb.Environments.from_linear_synthetic(200,n_actions=2,seed=2).binary().shuffle(n=5)
env2 = cb.Environments.from_linear_synthetic(200,n_actions=2,seed=3).binary().shuffle(n=5)
lrn0 = cb.RandomLearner()
lrn1 = cb.VowpalEpsilonLearner()
result = cb.Experiment(env0+env1+env2,[lrn0,lrn1]).run(quiet=True)
Below we show the same results in two different ways. The first we show the average reward for each learner on each environment seed.
[44]:
result.plot_learners(x='seed',err='se',out=None)
plt.xticks([1,2,3],[1,2,3])
plt.show()
For the second we show the average difference in performance. Notice that the point estimates below are the same as the distance between the blue and orange point estimates above. This plot is similar to a collection of paired hypothesis tests. When the confidence intervals around the point estimate don’t contain 0 we can say that one learner outperformed the other with statistical significance. In this case The VowpalEpsilonLearner outperforms random selection on all datasets.
[45]:
learner_id1 = 0
learner_id2 = 1
result.plot_contrast(learner_id1,learner_id2,l='learner_id',x='seed',xorder='+',err='se',out=None)
plt.xticks([1,2,3],[1,2,3])
plt.show()
We can also contrast by custom variables. Here we contrast a learner’s performance with respect to pre-conditioning.
[1]:
import coba as cb
env0 = cb.Environments.from_openml(180).reservoir(4000,seeds=range(10))
env1 = env0.scale('min' ,'minmax')
env2 = env0.scale('mean' ,'std' )
env3 = env0.scale('median','iqr' )
envs = env0 #no scaling
envs += env1 #scales features to [0,1] (aka, sklearn MinMaxScaler)
envs += env2 #scales features to approximately [-std,std] (aka, sklearn StandardScaler)
envs += env3 #scales features to approximately [-iqr,iqr] (aka, sklearn RobustScaler)
lrn = cb.VowpalEpsilonLearner()
result = cb.Experiment(envs,lrn).run(processes=8,quiet=True)
To determine the variable we want to contrast on we set l
. We then the specific l
values to contrast in l1
and l2
.
We set p
below so that plot_contrast knows we want to diff reward(l1,openml_data,reservoir_seed) - reward(l2,openml_data,reservoir_seed)
.
The conclusion we can draw from below is that the standard scaling and robust scaling give a small performance boost over no scaling.
[2]:
result.plot_contrast(l1=None,l2=['minmax','std','iqr'],l='scale',x='scale',p=['openml_data','reservoir_seed'],mode='diff',err='bs')
And here is the plot of the absolute values (i.e., rewards before differencing).
[3]:
result.plot_learners(x='scale',p=None,err='bs',legend=False)
Alternatively, we could also plot the probability that l2 > l1 using a binomial confidence interval.
[4]:
result.plot_contrast(l1=None,l2=['minmax','std','iqr'],l='scale',x='scale',p=['openml_data','reservoir_seed'],mode='prob',err='bi')
6. Adjusting the Moving Average
We can adjust the number of preceeding interactions to use when averaging y
via the span
parameter. By default coba plots the total average.
[5]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary().shuffle(n=16)
lrn0 = cb.RandomLearner()
result = cb.Experiment(env0,lrn0).run(quiet=True)
This overlays different window sizes for calculating the moving average.
[6]:
result.plot_learners(span=3 ,out=None,labels='prev 3' ,colors=1)
result.plot_learners(span=9 ,out=None,labels='prev 9' ,colors=3)
result.plot_learners(span=27 ,out=None,labels='prev 27' ,colors=4)
result.plot_learners(span=None,out=None,labels='all time' ,colors=0, title='')
7. Determining Confidence Intervals
To calculate confidence intervals in Result plots we require multiple observations per point.
For example, in the code below no confidence intervals can be calculated because there is only one environment per learner.
[7]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary()
lrns = [cb.RandomLearner(seed=1),cb.RandomLearner(seed=2)]
cb.Experiment(env0,lrns).run(quiet=True).plot_learners(err='se')
To generate multiple environments we can simply shuffle the given environment
[8]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary().shuffle(n=10)
lrns = [cb.RandomLearner(seed=1),cb.RandomLearner(seed=2)]
cb.Experiment(env0,lrns).run(quiet=True).plot_learners(err='bs')
Coba supports four types of confidence intervals natively 1. Std Deviation (err='std'
) 2. Std Error (err='se'
) 3. Bootstrap (err='bs'
) 4. Binomial (err='bi'
)
Some of these can be customized by instantiating its esimator directly as shown below where we increase the confidence level from the default of .95 to .99
[9]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=1).binary().shuffle(n=10)
lrns = [cb.RandomLearner(seed=1),cb.RandomLearner(seed=2)]
cb.Experiment(env0,lrns).run(quiet=True).plot_learners(err=cb.BootstrapCI(.99,cb.mean))
8. Applying Where Filters to Results
Sometimes we want to downselect an experiment Result. We can use the where
method for this. First we create a result.
[10]:
import coba as cb
env0 = cb.Environments.from_linear_synthetic(100,n_actions=5,seed=[0,1,2,3,4])
lrn0 = cb.RandomLearner()
result = cb.Experiment(env0,lrn0).run(quiet=True)
We can downselect to the first three environments in three different ways.
Notice that there are only three environments after the where
is executed.
[11]:
print(result )
print(result.where(environment_id=[0,1,2] ))
print(result.where(environment_id={'in':[0,1,2]} ))
print(result.where(environment_id={'<=':2} ))
print(result.where(environment_id={'match':'[012]'}))
{'Learners': 1, 'Environments': 5, 'Interactions': 500}
{'Learners': 1, 'Environments': 3, 'Interactions': 300}
{'Learners': 1, 'Environments': 3, 'Interactions': 300}
{'Learners': 1, 'Environments': 3, 'Interactions': 300}
{'Learners': 1, 'Environments': 3, 'Interactions': 300}
We can select on any variable in environments
, learners
, evaluators
, and interactions
. Below we select using seed
.
[12]:
result.environments.to_pandas()
[12]:
environment_id | env_type | n_actions | n_coeff | reward_features | seed | |
---|---|---|---|---|---|---|
0 | 0 | LinearSynthetic | 5 | 5 | [a, xa] | 0 |
1 | 1 | LinearSynthetic | 5 | 5 | [a, xa] | 1 |
2 | 2 | LinearSynthetic | 5 | 5 | [a, xa] | 2 |
3 | 3 | LinearSynthetic | 5 | 5 | [a, xa] | 3 |
4 | 4 | LinearSynthetic | 5 | 5 | [a, xa] | 4 |
[13]:
result.where(seed=1).environments.to_pandas()
[13]:
environment_id | env_type | n_actions | n_coeff | reward_features | seed | |
---|---|---|---|---|---|---|
0 | 1 | LinearSynthetic | 5 | 5 | [a, xa] | 1 |
9. Post Hoc Hyperparameter Tuning
The result plots also supports post-hoc hyperparameter tuning. It is post-hoc in the sense that all hyperparameters must be fully evaluated in the initial experiment. The tuning only happens via selection after the fact. This gives one an idea of how well a learner could have performed had appropriate tuning been applied up-front.
[14]:
import matplotlib.pyplot as plt
import coba as cb
env0 = cb.Environments.from_linear_synthetic(1000,n_actions=3,seed=1).binary().shuffle(n=5)
env1 = cb.Environments.from_mlp_synthetic(1000,n_actions=3,seed=2).binary().shuffle(n=5)
lrn1 = cb.VowpalEpsilonLearner(epsilon=0.01,features=['a','ax'])
lrn2 = cb.VowpalEpsilonLearner(epsilon=0.05,features=['a','ax','axx'])
result = cb.Experiment(env0+env1,[lrn1,lrn2]).run(quiet=True)
We see that the linear VW learner does very well on the linear environment while the non-linear VW learner does better on the non-linear environment
[15]:
result.plot_learners(x='env_type')
Using where_best
filter we can post-hoc keep only the best learner in each family for each environment type
[16]:
result.where_best(l='family',p='env_type',n=500).plot_learners(x='env_type',p=None)
We can overlay the average performance of best VW per environment over both underlying VW learners.
The green line is the average performance had we used the linear VW learner on the linear environment and the non-linear VW learner on the non-linear environment.
[17]:
result.plot_learners(out=None,labels=['VW linear','VW non-linear'])
result.where_best(l='family',p='env_type',n=500).plot_learners(l='family',colors='green',labels='VW only best')