OpeRewards

class coba.environments.OpeRewards

Transform logged interactions to simulated interactions.

Constructors

__init__(rwd_type: Literal['IPS', 'DM', 'DR'] | None = None, target: str = 'rewards', features=[1, 'x', 'a', 'xa', 'xxa'])

Instantiate an OpeRewards filter.

Parameters:

rewards_type – How to estimate the rewards function from the logged
data (i.e., inverse propensity score, direct method, or doubly robust) –

Remarks:: Adds ‘rewards’ to interactions with ‘context’, ‘action’, ‘reward’, and ‘probability’.

Methods

filter(interactions: Iterable[Interaction]) → Iterable[Interaction]: Apply filter to an Environment’s interactions.

Attributes