To interact with this Notebook:

. To download this Notebook: click here.

Environments

Coba Environments are sequences of Interactions. An Environment is any class with the following interface:

class Environment:
    def read(self)->Iterable[Interaction]:
        ...

Interactions are described in more detail in Interaction.ipynb. In this notebook we…

…introduce the Environments API, a friendly interface for creating and modifying collections of environments
…describe several different methods for creating environments using the Environments API
…describe a number of useful environment modifications that can be applied using the Environments API
…describe how the Environments API can be used to generate many environments by applying filters

Environments API

The Environments API creates and manipulates environments via chains of commands. It is accessed via

import coba as cb
cb.Environments

There are two types of Environments commands. Those that start with from_ and those that don’t.

Every Environments chain starts with a from_ method. After this any number of filters can be called to modify the environment.

Here is an example

import coba as cb
env = cb.Environments.from_openml(180).reservoir(1000).scale('mean','std').noise(reward=('g',0,.5))

This Environments chain will download the Covertype dataset from openml, create an environment from the dataset, use reservoir sampling to select 1000 random examples from the dataset, apply standardization to all context features in the dataset, and then add gaussian noise to all reward values. A more complete list of the chain commands can be found below.

A few more useful details about Environments chains: + All Environments chains are Python sequences so they must be indexed to access its contained environment. For example,

import coba as cb
env              = cb.Environments.from_openml(180).reservoir(1000)
env_interactions = list(env[0].read()) #Because cb.Environments always creates a Python sequence we have to index

All Environments chains are seeded so that the same environment will always be recreated. This means..

env1 = cb.Environments.from_openml(180).reservoir(1000).scale('mean','std')
env2 = cb.Environments.from_openml(180).reservoir(1000).scale('mean','std')

assert list(env1[0].read()) == list(env2[0].read())

Environment chains can be added together to get a collection of several environments…

import coba as cb
env1 = cb.Environments.from_openml(180).reservoir(1000).scale('mean','std')
env2 = cb.Environments.from_openml(180).reservoir(1000).scale('mean','std')

env = env1+env2

assert len(env) == 2
assert env[0] is env1
assert env[1] is env2

Many commands on Environment chains take sequences of values and create multiple environments. For example we can…

env = cb.Environments.from_openml(180).reservoir(1000).noise(reward=('g',0,.5),seed=[1,2])

assert len(env) == 2

The complete list of Environments chain commands can be found in the API reference.

Environment Creation

Coba has many ways to create environments. The five most common are:

From a custom class implementing the Environment interface
From an openml.org dataset or task
From a local dataset
From a synthetic dataset
From a result file
From a template file

Environments From Custom Classes

The easiest way to create an environment for simple testing is by creating a custom class. Here are two examples.

[1]:

import coba as cb

class BanditEnvironment:
    def read(self):
        return [ { 'actions':['a','b','c'], 'rewards':[0,100,-100] } ] * 50

environments = cb.Environments.from_custom(BanditEnvironment())
learners     = [cb.BanditEpsilonLearner(),cb.RandomLearner()]
experiment   = cb.Experiment(environments, learners)

experiment.run(quiet=True).plot_learners()

../_images/notebooks_Environments_4_0.png

[2]:

import coba as cb

class ContextualBanditEnvironment:
    def read(self):
        return [ { 'context':1, 'actions':['a','b','c'], 'rewards':[0,100,-100] } ] * 50

environments = cb.Environments.from_custom(ContextualBanditEnvironment())
learners     = [cb.VowpalEpsilonLearner(),cb.RandomLearner()]
experiment   = cb.Experiment(environments, learners)

experiment.run(quiet=True).plot_learners()

../_images/notebooks_Environments_5_0.png

[3]:

import coba as cb

environments = cb.Environments.from_openml(180).reservoir(1000)
learners     = [cb.VowpalEpsilonLearner(),cb.RandomLearner()]
experiment   = cb.Experiment(environments, learners)

experiment.run(quiet=True).plot_learners(xlim=(10,None))

../_images/notebooks_Environments_6_0.png

Environments From Openml Datasets

Next we can incorporate new Environments by loading them from openml.org. Openml is is an online repository of machine learning data sets which currently hosts over 21,000 datasets. Using either dataset ids or task ids Coba can tap into this repository and download these datasets to create Environments.

To get a sense of how this works let’s say we want to build an environment from the Covertype data set. We can do a dataset search on openml to see if this data set is hosted. This search finds several datasets and we pick the one that appears to be original. On the dataset’s landing page we see that its id is 180. Now, all we have to do to run an experiment with the Covertype data set is:

[1]:

import coba as cb
cb.Environments.from_openml(180)

1. Openml(data=180) | BatchSafe(Finalize)

Environments From Local Supervised Datasets

Next we can incorporate new environments from local datasets in the following formats:

CSV
ARFF (i.e., https://waikato.github.io/weka-wiki/formats_and_processing/arff_stable/)
Libsvm (e.g., https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html)
Manik (e.g., http://manikvarma.org/downloads/XC/XMLRepository.html)

For example, we may have the iris dataset saved in iris.arff in the arff format. To create an environment from this dataset we can.

[4]:

import coba as cb

environments = cb.Environments.from_supervised(cb.ArffSource("iris.arff"), label_col='Species').shuffle(n=10)
learners     = [cb.VowpalEpsilonLearner(),cb.RandomLearner()]
experiment   = cb.Experiment(environments, learners)

experiment.run(quiet=True).plot_learners(err='se')

../_images/notebooks_Environments_10_0.png

Environments From Synthetic Datasets

Coba provides access to several synthetic data generators from many function class:

Linear – cb.Environments.from_linear_synthetic
Nearest Neighbors – cb.Environments.from_neighbors_synthetic
Kernel Methods – cb.Environments.from_kernel_synthetic
Multilayer Perceptrons – cb.Environments.from_mlp_synthetic

For example, below we create a synthetic dataset whose expected reward function is linear with respect to action features:

[5]:

import coba as cb

environments = cb.Environments.from_linear_synthetic(
    n_interactions=1000,
    n_actions=5,
    n_context_features=0,
    n_action_features=5,
    reward_features = ["a"],
    seed=1
)

learners     = [cb.VowpalEpsilonLearner(),cb.RandomLearner()]
experiment   = cb.Experiment(environments, learners)

experiment.run(quiet=True).plot_learners(err='se')

../_images/notebooks_Environments_12_0.png

Environments From Result Files

A more advanced use case is to create environments from the result of previous experiments. This is useful for off-policy experiments.

We give an example below, but this topic is covered in much more detail in the Logged notebook.

[6]:

import coba as cb

print(" ------------------------------------------------------------------------------")
print(" Create the result file with logged actions and plot the on-policy performance.")
print(" ------------------------------------------------------------------------------")
environments = cb.Environments.from_openml(180).reservoir(1000)
learners     = [cb.VowpalEpsilonLearner(),cb.RandomLearner()]
experiment   = cb.Experiment(environments, learners, cb.SequentialCB(record=['context','actions','rewards','action','reward','probability']))
experiment.run('result.log', quiet=True).plot_learners(xlim=(10,None))

print(" ------------------------------------------------------------------------------")
print(" Use the result file to create logged bandit data and plot off-policy learning.")
print(" ------------------------------------------------------------------------------")
environments = cb.Environments.from_result('result.log').shuffle(seed=3)
learners     = [cb.VowpalEpsilonLearner(),cb.RandomLearner()]
experiment   = cb.Experiment(environments, learners, cb.SequentialCB(learn='off'))
experiment.run(quiet=True).plot_learners(xlim=(10,None))

 ------------------------------------------------------------------------------
 Create the result file with logged actions and plot the on-policy performance.
 ------------------------------------------------------------------------------

../_images/notebooks_Environments_14_1.png

 ------------------------------------------------------------------------------
 Use the result file to create logged bandit data and plot off-policy learning.
 ------------------------------------------------------------------------------

../_images/notebooks_Environments_14_3.png

Environments From Template Files

A final very advanced use case is to create environments from file templates.

This is useful when multiple researchers want to compare overall performance on the same set of environments.

By defining Environments in a template file the researchers can be confident their evaluating on the same data.

[7]:

import coba as cb

environments = cb.Environments.from_template('short_experiment.json')
learners     = [cb.VowpalEpsilonLearner(),cb.RandomLearner()]
experiment   = cb.Experiment(environments, learners)

experiment.run(quiet=True).plot_learners(err='se')

../_images/notebooks_Environments_16_0.png

Environment Filters

The Environments api provides tools to modify environments. Individual modifiers are called filters. Here we summarize common filters:

Filters that apply to all environments

reservoir – Randomly subselect interactions in an environment. Does not require loading an entire environment into memory.
shuffle – Randomly permute interactions in an environment. Requires loading an entire environment into memory.
take – Only keep the first n interactions in an environment.
where – Choose which environments to keep based on their characteristics.

[12]:

import coba as cb

class SimpleEnvironment:
    def read(self):
        return [{'i':i} for i in range(3)]

envs = cb.Environments.from_custom(SimpleEnvironment())

print("------unfiltered------")
print(list(envs[0].read()))

print()
print("------reservoir(2)------")
print(list(envs.reservoir(2)[0].read()))

print()
print("------shuffle()------")
print(list(envs.shuffle()[0].read()))

print()
print("------take(2)------")
print(list(envs.take(2)[0].read()))

print()
print("------where(n_interactions=2)------")
print(list(envs.where(n_interactions=2)[0].read()))

print()
print("------where(n_interactions=3)------")
print(list(envs.where(n_interactions=3)[0].read()))

print()
print("------where(n_interactions=5)------")
print(list(envs.where(n_interactions=5)[0].read()))

print()
print("------where(n_interactions=(2,5))------")
print(list(envs.where(n_interactions=(2,5))[0].read()))

------unfiltered------
[{'i': 0}, {'i': 1}, {'i': 2}]

------reservoir(2)------
[{'i': 2}, {'i': 1}]

------shuffle()------
[{'i': 0}, {'i': 2}, {'i': 1}]

------take(2)------
[{'i': 0}, {'i': 1}]

------where(n_interactions=2)------
[]

------where(n_interactions=3)------
[{'i': 0}, {'i': 1}, {'i': 2}]

------where(n_interactions=5)------
[]

------where(n_interactions=(2,5))------
[{'i': 0}, {'i': 1}, {'i': 2}]

Filters that apply to context features

scale – Pre-condition environment features by centering and scaling.
noise – Add noise to an environments features. Particularly useful with synthetic data which is otherwise noiseless.
impute – Fill in missing features with summary statistics.

[36]:

import coba as cb

class SimpleEnvironment:
    def read(self):
        return [{'context':i} for i in range(3)] + [{'context':None}]

envs = cb.Environments.from_custom(SimpleEnvironment())

print("------unfiltered------")
print(list(envs[0].read()))

print()
print("------scale('mean','std')------")
print(list(envs.scale('mean','std')[0].read()))

print()
print("------reservoir(2)------")
print(list(envs.reservoir(2)[0].read()))

print()
print("------shuffle()------")
print(list(envs.shuffle()[0].read()))

print()
print("------take(2)------")
print(list(envs.take(2)[0].read()))

print()
print("------noise()------")
print(list(envs.noise(context=(0,1))[0].read()))

print()
print("------where(n_interactions=2)------")
print(list(envs.where(n_interactions=2)[0].read()))

print()
print("------where(n_interactions=5)------")
print(list(envs.where(n_interactions=5)[0].read()))

print()
print("------impute('mean',indicator=True)------")
print(list(envs.impute('mean',indicator=True)[0].read()))

------unfiltered------
[{'context': 0}, {'context': 1}, {'context': 2}, {'context': None}]

------scale('mean','std')------
[{'context': -1.0}, {'context': 0.0}, {'context': 1.0}, {'context': None}]

------reservoir(2)------
[{'context': None}, {'context': 1}]

------shuffle()------
[{'context': 0}, {'context': None}, {'context': 2}, {'context': 1}]

------take(2)------
[{'context': 0}, {'context': 1}]

------noise()------
[{'context': 0.6259805618684365}, {'context': -1.0118803935399336}, {'context': 2.6130905694888105}, {'context': None}]

------where(n_interactions=2)------
[]

------where(n_interactions=5)------
[]

------impute('mean',indicator=True)------
[{'context': [0, 0]}, {'context': [1, 0]}, {'context': [2, 0]}, {'context': [1.0, 1]}]

Filters that apply to categorical features

repr – Control how categorical features are represented (by default categorical features are one-hot encoded).

[35]:

import coba as cb

levels  = list(map(str,range(2)))
actions = [cb.Categorical('0',levels),cb.Categorical('1',levels)]

class SimpleEnvironment2:
    def read(self):
        return [{'context':cb.Categorical(str(i),levels), 'actions':actions } for i in range(2)]

envs = cb.Environments.from_custom(SimpleEnvironment2())

print("------unfiltered------")
print(str(list(envs[0].read())))

print()
print("------repr('string','string')------")
print(str(list(envs.repr('string','string')[0].read())))

print()
print("------repr('onehot','string')------")
print(str(list(envs.repr('onehot','string')[0].read())))

------unfiltered------
[{'context': (1, 0), 'actions': [(1, 0), (0, 1)]}, {'context': (0, 1), 'actions': [(1, 0), (0, 1)]}]

------repr('string','string')------
[{'context': '0', 'actions': ['0', '1']}, {'context': '1', 'actions': ['0', '1']}]

------repr('onehot','string')------
[{'context': (1, 0), 'actions': ['0', '1']}, {'context': (0, 1), 'actions': ['0', '1']}]

------dense()------
[{'context': (1, 0), 'actions': [(1, 0), (0, 1)]}, {'context': (0, 1), 'actions': [(1, 0), (0, 1)]}]

Filters that apply to rewards

binary – Turn rewards into 1/0 with max reward being 1 and all other 0.

[41]:

import coba as cb

class SimpleEnvironment3:
    def read(self):
        return [{'actions':[1,2], 'rewards':[3,4] } for i in range(2)]

envs = cb.Environments.from_custom(SimpleEnvironment3())

print("------unfiltered------")
print(str(list(envs[0].read())))

print()
print("------binary()------")
print(str(list(envs.binary()[0].read())))

print()
print("------noise(reward=(0,1)).binary()------")
print(str(list(envs.noise(reward=(0,1)).binary()[0].read())))

------unfiltered------
[{'actions': [1, 2], 'rewards': DiscreteReward([[1, 2], [3, 4]])}, {'actions': [1, 2], 'rewards': DiscreteReward([[1, 2], [3, 4]])}]

------binary()------
[{'actions': [1, 2], 'rewards': DiscreteReward([[1, 2], [0, 1]])}, {'actions': [1, 2], 'rewards': DiscreteReward([[1, 2], [0, 1]])}]

------noise(reward=(0,1)).binary()------
[{'actions': [1, 2], 'rewards': DiscreteReward([[1, 2], [1, 0]])}, {'actions': [1, 2], 'rewards': DiscreteReward([[1, 2], [0, 1]])}]

Filters that apply to sparse context features

dense – Turn sparse data into dense data using the ‘hashing trick’.

[2]:

import coba as cb

class SimpleEnvironment4:
    def read(self):
        return [{ 'context': {'a':1,'b':2} } for i in range(2)]

envs = cb.Environments.from_custom(SimpleEnvironment4())

print("------unfiltered------")
print(str(list(envs[0].read())))

print()
print("------dense(5,'lookup')------")
print(str(list(envs.dense(5,'lookup')[0].read())))

print()
print("------dense(5,'hashing')------")
print(str(list(envs.dense(5,'hashing')[0].read())))

------unfiltered------
[{'context': {'a': 1, 'b': 2}}, {'context': {'a': 1, 'b': 2}}]

------dense(5,'lookup')------
[{'context': [1, 0, 0, 0, 2]}, {'context': [1, 0, 0, 0, 2]}]

------dense(5,'hashing')------
[{'context': [0, 2, 1, 0, 0]}, {'context': [0, 2, 1, 0, 0]}]

Filters that apply to simulated interactions

logged – turn simulated interactions into logged interactions using a behavior policy

[45]:

import coba as cb

class SimpleEnvironment5:
    def read(self):
        return [ { 'actions': [1,2], 'rewards': [3,4] } for i in range(1)]

envs = cb.Environments.from_custom(SimpleEnvironment5())

print("------unfiltered------")
print(str(list(envs[0].read())))

print()
print("------logged(cb.RandomLearner())------")
print(str(list(envs.logged(cb.RandomLearner())[0].read())))

------unfiltered------
[{'actions': [1, 2], 'rewards': DiscreteReward([[1, 2], [3, 4]])}]

------logged(cb.RandomLearner())------
[{'actions': [1, 2], 'rewards': DiscreteReward([[1, 2], [3, 4]]), 'action': 1.0, 'reward': 3, 'probability': 0.5}]

Filters that apply to logged interactions

ope_rewards – turn logged interactions into simulated interactions by estimating a reward function

[1]:

import coba as cb

class SimpleEnvironment6:
    def read(self):
        return [ { 'action': i, 'reward': i*2, 'probability':.5 } for i in range(1,2)]

envs = cb.Environments.from_custom(SimpleEnvironment6())

print("------unfiltered------")
print(str(list(envs[0].read())))

print()
print("------ope_rewards('IPS')------")
print(str(list(envs.ope_rewards('IPS')[0].read())))

------unfiltered------
[{'action': 1, 'reward': 2, 'probability': 0.5}]

------ope_rewards('IPS')------
[{'action': 1, 'reward': 2, 'probability': 0.5, 'rewards': BinaryReward(1, 4.0)}]

Expansion Filters

A number of environment filters take sequences of values to expand into the Cartesian product of environments.

[11]:

import coba as cb

print()
print("three environments with distinct linear reward functions")
print("--------------------------------------------------------")
print(cb.Environments.from_linear_synthetic(100, seed=[1,2,5]))

print()
print("three environments with different sets of interactions from the base")
print("--------------------------------------------------------------------")
print(cb.Environments.from_openml(180).reservoir(4000,seeds=[1,2,3]))

print()

print("three environments the same interactions but in different orders")
print("----------------------------------------------------------------")
print(cb.Environments.from_openml(180).reservoir(4000).shuffle(n=3))

print()
print("four environments from two openml datasets and two different selected subsets")
print("-----------------------------------------------------------------------------")
print(cb.Environments.from_openml([180,150]).reservoir(4000,[1,2]))


three environments with distinct linear reward functions
--------------------------------------------------------
1. LinearSynth(A=5,c=5,a=5,R=['a', 'xa'],seed=1) | BatchSafe(Finalize())
2. LinearSynth(A=5,c=5,a=5,R=['a', 'xa'],seed=2) | BatchSafe(Finalize())
3. LinearSynth(A=5,c=5,a=5,R=['a', 'xa'],seed=5) | BatchSafe(Finalize())

three environments with different sets of interactions from the base
--------------------------------------------------------------------
1. Openml(data=180) | Reservoir('reservoir_count': 4000, 'reservoir_seed': 1) | BatchSafe(Finalize())
2. Openml(data=180) | Reservoir('reservoir_count': 4000, 'reservoir_seed': 2) | BatchSafe(Finalize())
3. Openml(data=180) | Reservoir('reservoir_count': 4000, 'reservoir_seed': 3) | BatchSafe(Finalize())

three environments the same interactions but in different orders
----------------------------------------------------------------
1. Openml(data=180) | Reservoir('reservoir_count': 4000, 'reservoir_seed': 1) | Shuffle('shuffle_seed': 0) | BatchSafe(Finalize())
2. Openml(data=180) | Reservoir('reservoir_count': 4000, 'reservoir_seed': 1) | Shuffle('shuffle_seed': 1) | BatchSafe(Finalize())
3. Openml(data=180) | Reservoir('reservoir_count': 4000, 'reservoir_seed': 1) | Shuffle('shuffle_seed': 2) | BatchSafe(Finalize())

four environments from two openml datasets and two different selected subsets
-----------------------------------------------------------------------------
1. Openml(data=180) | Reservoir('reservoir_count': 4000, 'reservoir_seed': 1) | BatchSafe(Finalize())
2. Openml(data=180) | Reservoir('reservoir_count': 4000, 'reservoir_seed': 2) | BatchSafe(Finalize())
3. Openml(data=150) | Reservoir('reservoir_count': 4000, 'reservoir_seed': 1) | BatchSafe(Finalize())
4. Openml(data=150) | Reservoir('reservoir_count': 4000, 'reservoir_seed': 2) | BatchSafe(Finalize())

Custom Filters

Environment filters are classes with the following interface:

class EnvironmentalFilter:
    def filter(self, interactions:Iterable[Interaction])->Iterable[Interaction]:
        ...

Interaction is described in more detail in Interaction.ipynb.

Creating a Custom Filter

Below we create a basic Filter which sets all interaction contexts to 1.

[14]:

class SetContextToOne:
    def filter(self,interactions):
        for old in interactions:
            new = old.copy()   # we do a shallow copy to make sure we don't contaminate anything
            new['context'] = 1 # we then set the context to 1 on our copy
            yield new          # finally, we yield the new, modified interaction

Applying a Custom Filter

To apply our filter we use the cb.Environments.filter method.

[26]:

import coba as cb

env = cb.Environments.from_linear_synthetic(100, n_actions=2, n_context_features=1, n_action_features=0)
one = env.filter(SetContextToOne())

print("Before applying our custom filter")
print("---------------------------------")
print(f"  {cb.minimize(next(env[0].read()),2)}")

print()
print("After applying our custom filter")
print("--------------------------------")
print(f"  {cb.minimize(next(one[0].read()),2)}")

Before applying our custom filter
---------------------------------
  {'context': [0.19], 'actions': [[1, 0], [0, 1]], 'rewards': DiscreteReward([[[1, 0], [0, 1]], [0.45448, 0.5144]])}

After applying our custom filter
--------------------------------
  {'context': 1, 'actions': [[1, 0], [0, 1]], 'rewards': DiscreteReward([[[1, 0], [0, 1]], [0.45448, 0.5144]])}

Adding Parameters to Custom Filters

We can also provide parameters to describe a Filter. In the context of an Experiment these params go to Result.environments.

[24]:

import coba as cb

class ContextToN:
    def __init__(self,N):
        self._N = N
    @property
    def params(self):
        return {'N':self._N}
    def filter(self,interactions):
        for old in interactions:
            yield {**old, 'context':self._N}

env = cb.Environments.from_linear_synthetic(100, n_actions=2, n_context_features=1, n_action_features=1)
two = env.filter(ContextToN(2))

print("Environment parameters before filtering")
print("---------------------------------------")
print(f"  {env[0].params}")

print()
print("Environment parameters after filtering")
print("--------------------------------------")
print(f"  {two[0].params}")

Environment parameters before filtering
---------------------------------------
  {'env_type': 'LinearSynthetic', 'reward_features': ['a', 'xa'], 'n_coeff': 5, 'n_actions': 2, 'seed': 1}

Environment parameters after filtering
--------------------------------------
  {'env_type': 'LinearSynthetic', 'reward_features': ['a', 'xa'], 'n_coeff': 5, 'n_actions': 2, 'seed': 1, 'N': 2}

Synthetics

Here we visualize the reward functions of the available synthetic environments.

We also provide statistics on the complexity and solvability of each synthetic environment.

Setup

[30]:

import coba as cb
import matplotlib.pyplot as plt
from operator import itemgetter
from itertools import product
from statistics import quantiles
from collections import Counter, defaultdict

def visualize_global_reward(env):
    X = [a[0] for i in env.read() for a in                  i['actions'] ]
    Y = [r    for i in env.read() for r in map(i['rewards'],i['actions'])]
    plt.scatter(X,Y)
    plt.ylabel('Reward')
    plt.ylim(-.5,1.5)
    plt.xlabel('Action')
    plt.show()

def visualize_action_rewards(env):
    X = [i['context'][0]                      for i in env.read()]
    Y = [list(map(i['rewards'],i['actions'])) for i in env.read()]
    for i,Y in enumerate(zip(*Y)):plt.scatter(X,Y,label=i+1)
    plt.ylabel('Reward')
    plt.xlabel('Context')
    plt.legend(bbox_to_anchor=[1, 1.02],loc='upper left',title='Action')
    plt.show()

def environment_descriptives(envs):
    advantage, frequency = [], []
    for env in envs:
        R                  = [list(map(i['rewards'],i['actions'])) for i in env.read()]
        best_act,best_freq = Counter([r.index(max(r)) for r in R]).most_common()[0]
        best_const_R       = sum(map(itemgetter(best_act),R))
        best_adapt_R       = sum(map(max                 ,R))
        advantage.append((best_adapt_R-best_const_R)/len(R))
        frequency.append((best_freq                )/len(R))
    print(str([f"{r:.3f}" for r in quantiles(advantage,n=10)]) + " Advantage over best const")
    print(str([f"{r:.3f}" for r in quantiles(frequency,n=10)]) + " Frequency best const is best")

Linear Synthetic

[15]:

visualize_global_reward(cb.Environments.from_linear_synthetic(
    1000,
    n_actions=5,
    n_context_features=0,
    n_action_features=1,
    reward_features=['a','aa','aaa'], #third-order polynomials
    seed=1
)[0])

../_images/notebooks_Environments_44_0.png

[11]:

visualize_action_rewards(cb.Environments.from_linear_synthetic(
    1000,
    n_actions=10,
    n_context_features=1,
    n_action_features=0,
    n_coefficients=3,
    reward_features=['x','xx','xxx'], #third-order polynomials
    seed=1
)[0])

../_images/notebooks_Environments_45_0.png

[29]:

environment_descriptives(cb.Environments.from_linear_synthetic(
    1000,
    n_actions=10,
    n_context_features=5,
    n_action_features=0,
    seed=range(100)
))

['0.272', '0.279', '0.290', '0.294', '0.301', '0.308', '0.317', '0.327', '0.345'] Potential advantage over best const
['0.145', '0.154', '0.160', '0.166', '0.172', '0.180', '0.188', '0.196', '0.207'] Potential advantage over worst

Kernel Synthetic

[35]:

visualize_global_reward(cb.Environments.from_kernel_synthetic(
    1000,
    n_context_features=0,
    n_action_features=1,
    n_exemplars=1,
    kernel='exponential',
    seed=4
)[0])

../_images/notebooks_Environments_48_0.png

[40]:

visualize_action_rewards(cb.Environments.from_kernel_synthetic(
    1000,
    n_context_features=1,
    n_action_features=0,
    kernel='gaussian',
    seed=5,
)[0])

../_images/notebooks_Environments_49_0.png

[42]:

environment_descriptives(cb.Environments.from_kernel_synthetic(
    1000,
    n_actions=10,
    n_context_features=5,
    n_action_features=0,
    seed=range(100)
))

['0.295', '0.311', '0.319', '0.326', '0.334', '0.341', '0.350', '0.353', '0.365'] Advantage over best const
['0.139', '0.143', '0.150', '0.156', '0.162', '0.169', '0.177', '0.187', '0.197'] Frequency best const is best

MLP Synthetic

[46]:

visualize_global_reward(cb.Environments.from_mlp_synthetic(
    1000,
    n_context_features=0,
    n_action_features=1,
    seed=4
)[0])

../_images/notebooks_Environments_52_0.png

[48]:

visualize_action_rewards(cb.Environments.from_mlp_synthetic(
    1000,
    n_context_features=1,
    n_action_features=0,
    seed=3
)[0])

../_images/notebooks_Environments_53_0.png

[49]:

environment_descriptives(cb.Environments.from_mlp_synthetic(
    1000,
    n_actions=10,
    n_context_features=10,
    n_action_features=0,
    seed=range(100)
))

['0.375', '0.386', '0.405', '0.411', '0.416', '0.422', '0.430', '0.438', '0.447'] Advantage over best const
['0.199', '0.211', '0.229', '0.242', '0.255', '0.266', '0.280', '0.294', '0.309'] Frequency best const is best

Neighbors Synthetic

[50]:

visualize_global_reward(cb.Environments.from_neighbors_synthetic(
    1000,
    n_context_features=0,
    n_action_features=1,
    n_neighborhoods=20,
    seed=1
)[0])

../_images/notebooks_Environments_56_0.png

[51]:

visualize_action_rewards(cb.Environments.from_neighbors_synthetic(
    1000,
    n_actions=5,
    n_context_features=1,
    n_action_features=0,
    n_neighborhoods=20,
    seed=1
)[0])

../_images/notebooks_Environments_57_0.png

[52]:

environment_descriptives(cb.Environments.from_neighbors_synthetic(
    1000,
    n_actions=5,
    n_context_features=10,
    n_action_features=0,
    n_neighborhoods=20,
    seed=range(100)
))

['0.200', '0.221', '0.239', '0.254', '0.263', '0.277', '0.287', '0.306', '0.329'] Advantage over best const
['0.254', '0.264', '0.279', '0.291', '0.299', '0.312', '0.324', '0.347', '0.386'] Frequency best const is best