Interactions
Coba calls individual training/evaluation examples “Interactions”. There are three types of interactions:
1. coba.SimulatedInteraction(
context : Optional[Features],
actions : Sequence[Features],
rewards : Sequence[float] | Callable[[Action],float],
**kwargs: Any
)
2. coba.LoggedInteraction(
context : Optional[Features],
action : Features,
reward : float,
probability: Optional[float],
**kwargs : Any
)
3. coba.GroundedInteraction(
context : Optional[Features],
actions : Sequence[Features],
rewards : Sequence[float] | Callable[[Action],float],
feedbacks: Sequence[float] | Callable[[Action],float],
**kwargs : Any
)
The interaction classes above are provided as type-hints. In practice interactions are nothing more than dicts. That is,
cb.SimulatedInteraction(1,[1],[3]) == {'context':1,'actions':[1],'rewards':[3]}
cb.LoggedInteraction(1,2,3,.5) == {'context':1,'action':2,'reward':3,'probability':.5}
cb.GroundedInteraction(1,[1],[3],[5]) == {'context':1,'actions':[1],'rewards':[3],'feedbacks':[5]}
The possible values for contexts and actions (i.e. the Features
type hint above) are described in more detail below.
Features
Coba Interactions use three types of features:
Individual values such as:
1 or 'a' or None
Dense sequences of values such as:
[1,2,3] or ['a',1,'b'] or (1,2,3)
Sparse mappings of values (where missing values are assumed to be 0):
{1:2, 'a':3}
A few more notes:
Individual values and dense values with strings can be thought of as sparse collections. That is,
[1,'a',5.2] ~= {0:1, 'a':1, 2:5.2}
Coba learners that support feature interactions assume ‘x’ is context features and ‘a’ is action features. So,
#Given given context ['a','b'] and action 'c' learner features 'x xx xa' would use as features ['a', 'b'] + ['aa', 'ab', 'bb'] + ['ac','bc']
An easy trap to fall into is using numerical features when what is wanted are one-hot or string features
#Three actions with one continuous feature. This can cause high-correlation among action rewards. {'actions':[1,2,3]} #Three actions with disjoint features. Actions have independent predictions. {'actions':[(1,0,0),(0,1,0),(0,0,1)]} #Equivalent to one-hot-encoding based on ['1','2','3'] ~= [{'1':1},{'2':1},{'3':1}] {'actions':['1','2','3']}
Simulated Interaction
Simulated interactions have full reward and action information. They can simulate the learning and outcome of any given policy.
The following are examples of simulated interactions:
# One discrete simulated interaction whose two actions have a single continuous feature
{'actions':[1,2], 'rewards':[4,5]}
# Two discrete simulated interactions whose two actions have disjoint action features
{'actions':['a','b'] , 'rewards':[0,1]}
{'actions':[(1,0),(0,1)], 'rewards':[0,1]}
# Two discrete simulated interactions whose two actions have sparse features
{'actions':[['a','b'],['c','d']] , 'rewards':[0,1]}
{'actions':[{'a':1, 'b':1}, {'c':1, 'd':1}], 'rewards':[0,1]}
# One continuous simulated interaction. It is up to the learner to know the actions
{'actions':None, 'rewards':lambda x: abs(x-2)}
Logged Interaction
Logged interactions contain data that was logged during an online interaction. They can be used for off-policy analysis.
The following are examples of logged interactions:
# One logged interaction whose action is a single continuous feature and without probability information
{'context': 1, 'action':1, 'reward':4}
# One logged interaction whose action is a single continuous feature and with probability information
{'context': 1, 'action':1, 'reward':4, 'probability':.1}
# Two logged intreactions whose actions have a single categorical feature
{'context': 1, 'action':'a' , 'reward':0}
{'context': 1, 'action':(1,0), 'reward':0}
# Two logged interactions whose actions have sparse features
{'action':['a','b'] , 'rewards':1}
{'action':{'a':1, 'b':1}, 'rewards':1}
Simulated-Logged Interaction
Because the kind of interaction is determined by the dictionary keys it is possible to have an interaction that is both logged and simulated.
For example, by applying a logging policy to a simulated interaction we can add logged interaction data as well:
{'actions':[1,2], 'rewards':[4,5], 'action':2, 'reward':5, 'probability':.5}
This kind of interaction is particularly useful for off-policy analysis because we can compare against on-policy methods as well.
The easiest way to create this data is to apply the logged filter to simulated interactions.
[1]:
import coba as cb
logging_policy = cb.RandomLearner()
sim = cb.Environments.from_linear_synthetic(100, n_actions=2, n_context_features=1, n_action_features=1)
log = sim.logged(logging_policy)
print("----------------------------------------------")
print("First simulated interaction in the environment")
print("----------------------------------------------")
print(" ",end="")
print(cb.minimize(next(sim[0].read()),2))
print()
print("----------------------------------------------")
print("First logged interaction using logging policy")
print("----------------------------------------------")
print(" ",end="")
print(cb.minimize(next(log[0].read()),2))
----------------------------------------------
First simulated interaction in the environment
----------------------------------------------
{'context': [-0.75], 'actions': [[-0.59], [0.95]], 'rewards': [0.82, -0.05]}
----------------------------------------------
First logged interaction using logging policy
----------------------------------------------
{'context': [-0.75], 'actions': [[-0.59], [0.95]], 'rewards': [0.82, -0.05], 'action': [-0.59], 'probability': 0.5, 'reward': 0.82}
Grounded Interaction
Grounded interactions exist to support Interaction Grounded Learning experiments.
They have ‘actions’, ‘rewards’, ‘feedbacks’, and ‘userid’. They can be created from simulated interactions using grounded
.
[3]:
import coba as cb
sim = cb.Environments.from_linear_synthetic(100, n_actions=2, n_context_features=1, n_action_features=1)
grd = sim.grounded(5,1,20,2)
print("----------------------------------------------")
print("Simulated interaction")
print("----------------------------------------------")
print(" ",end="")
print(cb.minimize(next(sim[0].read()),5))
print()
print("----------------------------------------------")
print("Grounded interaction from simulated")
print("----------------------------------------------")
print(" ",end="")
print(cb.minimize(next(grd[0].read()),5))
----------------------------------------------
Simulated interaction
----------------------------------------------
{'context': [-0.74531], 'actions': [[-0.59205], [0.94641]], 'rewards': DiscreteReward([[[-0.59205], [0.94641]], [0.17759, 1.02217]])}
----------------------------------------------
Grounded interaction from simulated
----------------------------------------------
{'context': [-0.74531], 'actions': [[-0.59205], [0.94641]], 'rewards': BinaryReward([0.94641]), 'feedbacks': GroundedFeedback([0.94641]), 'userid': 0, 'isnormal': True}