{ "cells": [ { "cell_type": "markdown", "id": "a3be78c4-1f40-4856-837d-0269bf6d0ee0", "metadata": {}, "source": [ "# Interactions\n", "Coba calls individual training/evaluation examples \"Interactions\". There are three types of interactions:\n", "```python\n", "1. coba.SimulatedInteraction(\n", " context : Optional[Features],\n", " actions : Sequence[Features], \n", " rewards : Sequence[float] | Callable[[Action],float],\n", " **kwargs: Any\n", ")\n", "\n", "2. coba.LoggedInteraction(\n", " context : Optional[Features],\n", " action : Features,\n", " reward : float,\n", " probability: Optional[float],\n", " **kwargs : Any\n", ")\n", "\n", "3. coba.GroundedInteraction(\n", " context : Optional[Features],\n", " actions : Sequence[Features],\n", " rewards : Sequence[float] | Callable[[Action],float],\n", " feedbacks: Sequence[float] | Callable[[Action],float],\n", " **kwargs : Any\n", ")\n", "\n", "```\n", "\n", "The interaction classes above are provided as type-hints. In practice interactions are nothing more than dicts. That is,\n", "```python\n", "cb.SimulatedInteraction(1,[1],[3]) == {'context':1,'actions':[1],'rewards':[3]}\n", "cb.LoggedInteraction(1,2,3,.5) == {'context':1,'action':2,'reward':3,'probability':.5}\n", "cb.GroundedInteraction(1,[1],[3],[5]) == {'context':1,'actions':[1],'rewards':[3],'feedbacks':[5]}\n", "```\n", "\n", "The possible values for contexts and actions (i.e. the `Features` type hint above) are described in more detail below.\n" ] }, { "cell_type": "markdown", "id": "77b21059-0c46-414b-bc05-be1c85c641ef", "metadata": {}, "source": [ "## Features\n", "\n", "Coba Interactions use three types of features:\n", "\n", "1. Individual values such as:\n", " ```python\n", " 1 or 'a' or None\n", " ```\n", "2. Dense sequences of values such as:\n", " ```python\n", " [1,2,3] or ['a',1,'b'] or (1,2,3)\n", " ```\n", "3. Sparse mappings of values (where missing values are assumed to be 0):\n", " ```python\n", " {1:2, 'a':3}\n", " ```\n", "\n", "A few more notes:\n", "\n", "* Individual values and dense values with strings can be thought of as sparse collections. That is,\n", " ```python\n", " [1,'a',5.2] ~= {0:1, 'a':1, 2:5.2}\n", " ```\n", "* Coba learners that support feature interactions assume 'x' is context features and 'a' is action features. So,\n", " ```python\n", " #Given given context ['a','b'] and action 'c' learner features 'x xx xa' would use as features\n", " ['a', 'b'] + ['aa', 'ab', 'bb'] + ['ac','bc']\n", " ```\n", "* An easy trap to fall into is using numerical features when what is wanted are one-hot or string features\n", " ```python\n", " #Three actions with one continuous feature. This can cause high-correlation among action rewards.\n", " {'actions':[1,2,3]}\n", " #Three actions with disjoint features. Actions have independent predictions.\n", " {'actions':[(1,0,0),(0,1,0),(0,0,1)]}\n", " #Equivalent to one-hot-encoding based on ['1','2','3'] ~= [{'1':1},{'2':1},{'3':1}]\n", " {'actions':['1','2','3']}\n", " ```" ] }, { "cell_type": "markdown", "id": "c4ab555a-9260-42d8-b9d3-8cfbb163cce2", "metadata": {}, "source": [ "## Simulated Interaction\n", "Simulated interactions have full reward and action information. They can simulate the learning and outcome of any given policy.\n", "\n", "The following are examples of simulated interactions:\n", "\n", "```python\n", "# One discrete simulated interaction whose two actions have a single continuous feature\n", "{'actions':[1,2], 'rewards':[4,5]}\n", "\n", "# Two discrete simulated interactions whose two actions have disjoint action features\n", "{'actions':['a','b'] , 'rewards':[0,1]}\n", "{'actions':[(1,0),(0,1)], 'rewards':[0,1]}\n", "\n", "# Two discrete simulated interactions whose two actions have sparse features\n", "{'actions':[['a','b'],['c','d']] , 'rewards':[0,1]}\n", "{'actions':[{'a':1, 'b':1}, {'c':1, 'd':1}], 'rewards':[0,1]}\n", "\n", "# One continuous simulated interaction. It is up to the learner to know the actions\n", "{'actions':None, 'rewards':lambda x: abs(x-2)}\n", "```" ] }, { "cell_type": "markdown", "id": "c5c66092-2241-402e-b6f3-f1bd61885f25", "metadata": {}, "source": [ "## Logged Interaction\n", "Logged interactions contain data that was logged during an online interaction. They can be used for off-policy analysis.\n", "\n", "The following are examples of logged interactions:\n", "\n", "```python\n", "# One logged interaction whose action is a single continuous feature and without probability information\n", "{'context': 1, 'action':1, 'reward':4}\n", "\n", "# One logged interaction whose action is a single continuous feature and with probability information\n", "{'context': 1, 'action':1, 'reward':4, 'probability':.1}\n", "\n", "# Two logged intreactions whose actions have a single categorical feature\n", "{'context': 1, 'action':'a' , 'reward':0}\n", "{'context': 1, 'action':(1,0), 'reward':0}\n", "\n", "# Two logged interactions whose actions have sparse features\n", "{'action':['a','b'] , 'rewards':1}\n", "{'action':{'a':1, 'b':1}, 'rewards':1}\n", "```" ] }, { "cell_type": "markdown", "id": "c0f57e86-a9a7-488a-a7d6-435126e733a6", "metadata": {}, "source": [ "## Simulated-Logged Interaction\n", "\n", "Because the kind of interaction is determined by the dictionary keys it is possible to have an interaction that is both logged and simulated.\n", "\n", "For example, by applying a logging policy to a simulated interaction we can add logged interaction data as well:\n", "```python\n", "{'actions':[1,2], 'rewards':[4,5], 'action':2, 'reward':5, 'probability':.5}\n", "```\n", "\n", "This kind of interaction is particularly useful for off-policy analysis because we can compare against on-policy methods as well.\n", "\n", "The easiest way to create this data is to apply the logged filter to simulated interactions." ] }, { "cell_type": "code", "execution_count": 1, "id": "71b20427-2bbb-4ece-a8a2-f53aa5b906da", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "----------------------------------------------\n", "First simulated interaction in the environment\n", "----------------------------------------------\n", " {'context': [-0.75], 'actions': [[-0.59], [0.95]], 'rewards': [0.82, -0.05]}\n", "\n", "----------------------------------------------\n", "First logged interaction using logging policy\n", "----------------------------------------------\n", " {'context': [-0.75], 'actions': [[-0.59], [0.95]], 'rewards': [0.82, -0.05], 'action': [-0.59], 'probability': 0.5, 'reward': 0.82}\n" ] } ], "source": [ "import coba as cb\n", "\n", "logging_policy = cb.RandomLearner()\n", "\n", "sim = cb.Environments.from_linear_synthetic(100, n_actions=2, n_context_features=1, n_action_features=1)\n", "log = sim.logged(logging_policy)\n", "\n", "print(\"----------------------------------------------\")\n", "print(\"First simulated interaction in the environment\")\n", "print(\"----------------------------------------------\")\n", "print(\" \",end=\"\")\n", "print(cb.minimize(next(sim[0].read()),2))\n", "\n", "print()\n", "print(\"----------------------------------------------\")\n", "print(\"First logged interaction using logging policy\")\n", "print(\"----------------------------------------------\")\n", "print(\" \",end=\"\")\n", "print(cb.minimize(next(log[0].read()),2))" ] }, { "cell_type": "markdown", "id": "a9ef4979-b34d-4a9f-99c8-7745ce5019d0", "metadata": {}, "source": [ "## Grounded Interaction\n", "\n", "Grounded interactions exist to support [Interaction Grounded Learning](https://proceedings.mlr.press/v139/xie21e.html) experiments.\n", "\n", "They have 'actions', 'rewards', 'feedbacks', and 'userid'. They can be created from simulated interactions using `grounded`." ] }, { "cell_type": "code", "execution_count": 3, "id": "5d45a1ef-8229-4500-b258-c705a0655688", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "----------------------------------------------\n", "Simulated interaction\n", "----------------------------------------------\n", " {'context': [-0.74531], 'actions': [[-0.59205], [0.94641]], 'rewards': DiscreteReward([[[-0.59205], [0.94641]], [0.17759, 1.02217]])}\n", "\n", "----------------------------------------------\n", "Grounded interaction from simulated\n", "----------------------------------------------\n", " {'context': [-0.74531], 'actions': [[-0.59205], [0.94641]], 'rewards': BinaryReward([0.94641]), 'feedbacks': GroundedFeedback([0.94641]), 'userid': 0, 'isnormal': True}\n" ] } ], "source": [ "import coba as cb\n", "\n", "sim = cb.Environments.from_linear_synthetic(100, n_actions=2, n_context_features=1, n_action_features=1)\n", "grd = sim.grounded(5,1,20,2)\n", "\n", "print(\"----------------------------------------------\")\n", "print(\"Simulated interaction\")\n", "print(\"----------------------------------------------\")\n", "print(\" \",end=\"\")\n", "print(cb.minimize(next(sim[0].read()),5))\n", "\n", "print()\n", "print(\"----------------------------------------------\")\n", "print(\"Grounded interaction from simulated\")\n", "print(\"----------------------------------------------\")\n", "print(\" \",end=\"\")\n", "print(cb.minimize(next(grd[0].read()),5))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 5 }