To interact with this Notebook: Binder badge. To download this Notebook: click here.

Interactions

Coba calls individual training/evaluation examples “Interactions”. There are three types of interactions:

1. coba.SimulatedInteraction(
    context : Optional[Features],
    actions : Sequence[Features],
    rewards : Sequence[float] | Callable[[Action],float],
    **kwargs: Any
)

2. coba.LoggedInteraction(
    context    : Optional[Features],
    action     : Features,
    reward     : float,
    probability: Optional[float],
    **kwargs   : Any
)

3. coba.GroundedInteraction(
    context  : Optional[Features],
    actions  : Sequence[Features],
    rewards  : Sequence[float] | Callable[[Action],float],
    feedbacks: Sequence[float] | Callable[[Action],float],
    **kwargs : Any
)

The interaction classes above are provided as type-hints. In practice interactions are nothing more than dicts. That is,

cb.SimulatedInteraction(1,[1],[3])    == {'context':1,'actions':[1],'rewards':[3]}
cb.LoggedInteraction(1,2,3,.5)        == {'context':1,'action':2,'reward':3,'probability':.5}
cb.GroundedInteraction(1,[1],[3],[5]) == {'context':1,'actions':[1],'rewards':[3],'feedbacks':[5]}

The possible values for contexts and actions (i.e. the Features type hint above) are described in more detail below.

Features

Coba Interactions use three types of features:

  1. Individual values such as:

    1 or 'a' or None
    
  2. Dense sequences of values such as:

    [1,2,3] or ['a',1,'b'] or (1,2,3)
    
  3. Sparse mappings of values (where missing values are assumed to be 0):

    {1:2, 'a':3}
    

A few more notes:

  • Individual values and dense values with strings can be thought of as sparse collections. That is,

    [1,'a',5.2] ~= {0:1, 'a':1, 2:5.2}
    
  • Coba learners that support feature interactions assume ‘x’ is context features and ‘a’ is action features. So,

    #Given given context ['a','b'] and action 'c' learner features 'x xx xa' would use as features
    ['a', 'b'] + ['aa', 'ab', 'bb'] + ['ac','bc']
    
  • An easy trap to fall into is using numerical features when what is wanted are one-hot or string features

    #Three actions with one continuous feature. This can cause high-correlation among action rewards.
    {'actions':[1,2,3]}
    #Three actions with disjoint features. Actions have independent predictions.
    {'actions':[(1,0,0),(0,1,0),(0,0,1)]}
    #Equivalent to one-hot-encoding based on ['1','2','3'] ~= [{'1':1},{'2':1},{'3':1}]
    {'actions':['1','2','3']}
    

Simulated Interaction

Simulated interactions have full reward and action information. They can simulate the learning and outcome of any given policy.

The following are examples of simulated interactions:

# One discrete simulated interaction whose two actions have a single continuous feature
{'actions':[1,2], 'rewards':[4,5]}

# Two discrete simulated interactions whose two actions have disjoint action features
{'actions':['a','b']    , 'rewards':[0,1]}
{'actions':[(1,0),(0,1)], 'rewards':[0,1]}

# Two discrete simulated interactions whose two actions have sparse features
{'actions':[['a','b'],['c','d']]           , 'rewards':[0,1]}
{'actions':[{'a':1, 'b':1}, {'c':1, 'd':1}], 'rewards':[0,1]}

# One continuous simulated interaction. It is up to the learner to know the actions
{'actions':None, 'rewards':lambda x: abs(x-2)}

Logged Interaction

Logged interactions contain data that was logged during an online interaction. They can be used for off-policy analysis.

The following are examples of logged interactions:

# One logged interaction whose action is a single continuous feature and without probability information
{'context': 1, 'action':1, 'reward':4}

# One logged interaction whose action is a single continuous feature and with probability information
{'context': 1, 'action':1, 'reward':4, 'probability':.1}

# Two logged intreactions whose actions have a single categorical feature
{'context': 1, 'action':'a'  , 'reward':0}
{'context': 1, 'action':(1,0), 'reward':0}

# Two logged interactions whose actions have sparse features
{'action':['a','b']     , 'rewards':1}
{'action':{'a':1, 'b':1}, 'rewards':1}

Simulated-Logged Interaction

Because the kind of interaction is determined by the dictionary keys it is possible to have an interaction that is both logged and simulated.

For example, by applying a logging policy to a simulated interaction we can add logged interaction data as well:

{'actions':[1,2], 'rewards':[4,5], 'action':2, 'reward':5, 'probability':.5}

This kind of interaction is particularly useful for off-policy analysis because we can compare against on-policy methods as well.

The easiest way to create this data is to apply the logged filter to simulated interactions.

[1]:
import coba as cb

logging_policy = cb.RandomLearner()

sim = cb.Environments.from_linear_synthetic(100, n_actions=2, n_context_features=1, n_action_features=1)
log = sim.logged(logging_policy)

print("----------------------------------------------")
print("First simulated interaction in the environment")
print("----------------------------------------------")
print("  ",end="")
print(cb.minimize(next(sim[0].read()),2))

print()
print("----------------------------------------------")
print("First logged interaction using logging policy")
print("----------------------------------------------")
print("  ",end="")
print(cb.minimize(next(log[0].read()),2))
----------------------------------------------
First simulated interaction in the environment
----------------------------------------------
  {'context': [-0.75], 'actions': [[-0.59], [0.95]], 'rewards': [0.82, -0.05]}

----------------------------------------------
First logged interaction using logging policy
----------------------------------------------
  {'context': [-0.75], 'actions': [[-0.59], [0.95]], 'rewards': [0.82, -0.05], 'action': [-0.59], 'probability': 0.5, 'reward': 0.82}

Grounded Interaction

Grounded interactions exist to support Interaction Grounded Learning experiments.

They have ‘actions’, ‘rewards’, ‘feedbacks’, and ‘userid’. They can be created from simulated interactions using grounded.

[3]:
import coba as cb

sim = cb.Environments.from_linear_synthetic(100, n_actions=2, n_context_features=1, n_action_features=1)
grd = sim.grounded(5,1,20,2)

print("----------------------------------------------")
print("Simulated interaction")
print("----------------------------------------------")
print("  ",end="")
print(cb.minimize(next(sim[0].read()),5))

print()
print("----------------------------------------------")
print("Grounded interaction from simulated")
print("----------------------------------------------")
print("  ",end="")
print(cb.minimize(next(grd[0].read()),5))
----------------------------------------------
Simulated interaction
----------------------------------------------
  {'context': [-0.74531], 'actions': [[-0.59205], [0.94641]], 'rewards': DiscreteReward([[[-0.59205], [0.94641]], [0.17759, 1.02217]])}

----------------------------------------------
Grounded interaction from simulated
----------------------------------------------
  {'context': [-0.74531], 'actions': [[-0.59205], [0.94641]], 'rewards': BinaryReward([0.94641]), 'feedbacks': GroundedFeedback([0.94641]), 'userid': 0, 'isnormal': True}