{ "cells": [ { "cell_type": "markdown", "id": "9ebdba9b", "metadata": {}, "source": [ "# Building MDPs\n", "In Stormvogel, a **Markov Decision Process (MDP)** consists of:\n", "* states $S$,\n", "* actions $A$,\n", "* an initial state $s_0$,\n", "* a mapping from states to sets of *enabled actions*,\n", "* a successor distribution $P(s,a)$ for every state $s$ and every enabled action $a$, i.e., sets of transitions between states $s$ and $s'$, each annotated with an action and a probability.\n", "* state labels $L(s)$.\n", "\n", "\n", "Here we show how to construct a simple example mdp using the bird API and the model builder API.\n", "The idea is that you can choose to study (you will likely pass the exam but you have less free time) or not to study (you will have more free time but risk failing the exam)." ] }, { "cell_type": "markdown", "id": "fda53350", "metadata": {}, "source": [ "## The study dilemma\n", "This little MDP is supposed to help you decide whether you should stuy or not." ] }, { "cell_type": "markdown", "id": "9d431f92", "metadata": {}, "source": [ "### Bird API\n", "For MDPs, you specify the availaible actions in `available_actions`. An action here is simply a string. You specify the transition of a state-action pair in `delta`." ] }, { "cell_type": "code", "execution_count": 1, "id": "0d80cdd7", "metadata": { "execution": { "iopub.execute_input": "2026-07-01T08:29:36.776876Z", "iopub.status.busy": "2026-07-01T08:29:36.776596Z", "iopub.status.idle": "2026-07-01T08:29:37.449884Z", "shell.execute_reply": "2026-07-01T08:29:37.449326Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stormvogel import *\n", "import stormvogel\n", "\n", "\n", "def available_actions(s):\n", " if s == \"init\":\n", " return [\"study\", \"don't study\"]\n", " else:\n", " return [\"\"]\n", "\n", "\n", "def delta(s, a):\n", " if a == \"study\":\n", " return [(1, \"did study\")]\n", " elif a == \"don't study\":\n", " return [(1, \"did not study\")]\n", " elif s == \"did study\":\n", " return [(9 / 10, \"pass test\"), (1 / 10, \"fail test\")]\n", " elif s == \"did not study\":\n", " return [(2 / 5, \"pass test\"), (3 / 5, \"fail test\")]\n", " else:\n", " return [(1, \"end\")]\n", "\n", "\n", "def labels(s):\n", " return s\n", "\n", "\n", "# For rewards, you have to provide a dict. This enables multiple reward models if you use a non-singleton list.\n", "def rewards(s: bird.BirdState):\n", " if s == \"pass test\":\n", " return {\"R\": 100}\n", " elif s == \"did not study\":\n", " return {\"R\": 15}\n", " else:\n", " return {\"R\": 0}\n", "\n", "\n", "bird_study = bird.build_bird(\n", " delta=delta,\n", " init=\"init\",\n", " available_actions=available_actions,\n", " labels=labels,\n", " modeltype=ModelType.MDP,\n", " rewards=rewards,\n", ")\n", "show(bird_study)" ] }, { "cell_type": "markdown", "id": "e2fbdfdc", "metadata": {}, "source": [ "### Model API" ] }, { "cell_type": "code", "execution_count": 2, "id": "23ffede0", "metadata": { "execution": { "iopub.execute_input": "2026-07-01T08:29:37.463913Z", "iopub.status.busy": "2026-07-01T08:29:37.463443Z", "iopub.status.idle": "2026-07-01T08:29:37.494791Z", "shell.execute_reply": "2026-07-01T08:29:37.494131Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "mdp = stormvogel.model.new_mdp()\n", "\n", "init = mdp.initial_state\n", "study = mdp.action(\"study\")\n", "not_study = mdp.action(\"don't study\")\n", "\n", "did_study = mdp.new_state(\"did study\")\n", "did_not_study = mdp.new_state(\"did not study\")\n", "\n", "pass_test = mdp.new_state(\"pass test\")\n", "fail_test = mdp.new_state(\"fail test\")\n", "end = mdp.new_state(\"end\")\n", "\n", "init.set_choices(\n", " {\n", " study: [(1, did_study)],\n", " not_study: [(1, did_not_study)],\n", " }\n", ")\n", "\n", "did_study.set_choices([(9 / 10, pass_test), (1 / 10, fail_test)])\n", "did_not_study.set_choices([(4 / 10, pass_test), (6 / 10, fail_test)])\n", "\n", "pass_test.set_choices([(1, end)])\n", "fail_test.set_choices([(1, end)])\n", "\n", "mdp.add_self_loops()\n", "\n", "reward_model = mdp.new_reward_model(\"R\")\n", "reward_model.set_state_reward(pass_test, 100)\n", "reward_model.set_state_reward(fail_test, 0)\n", "reward_model.set_state_reward(did_not_study, 15)\n", "reward_model.set_unset_rewards(0)\n", "\n", "show(mdp)" ] }, { "cell_type": "markdown", "id": "32271b82", "metadata": {}, "source": [ "## Grid model\n", "An MDP model that consists of a 3x3 grid. The direction to walk is chosen by an action." ] }, { "cell_type": "markdown", "id": "01eda1ba", "metadata": {}, "source": [ "### Bird API" ] }, { "cell_type": "code", "execution_count": 3, "id": "710f8714", "metadata": { "execution": { "iopub.execute_input": "2026-07-01T08:29:37.504125Z", "iopub.status.busy": "2026-07-01T08:29:37.503908Z", "iopub.status.idle": "2026-07-01T08:29:37.536144Z", "shell.execute_reply": "2026-07-01T08:29:37.535582Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N = 3\n", "\n", "ACTION_SEMANTICS = {\"l\": (-1, 0), \"r\": (1, 0), \"u\": (0, -1), \"d\": (0, 1)}\n", "\n", "\n", "def available_actions(s):\n", " res = []\n", " if s[0] > 0:\n", " res.append(\"l\")\n", " if s[0] < N - 1:\n", " res.append(\"r\")\n", " if s[1] > 0:\n", " res.append(\"u\")\n", " if s[1] < N - 1:\n", " res.append(\"d\")\n", " return res\n", "\n", "\n", "def pairwise_plus(t1, t2):\n", " return (t1[0] + t2[0], t1[1] + t2[1])\n", "\n", "\n", "def delta(s, a):\n", " return [(1, pairwise_plus(s, ACTION_SEMANTICS[a]))]\n", "\n", "\n", "def labels(s):\n", " return [str(s)]\n", "\n", "\n", "m1 = bird.build_bird(\n", " init=(0, 0), available_actions=available_actions, labels=labels, delta=delta\n", ")\n", "show(m1)" ] }, { "cell_type": "markdown", "id": "3eff2862", "metadata": {}, "source": [ "### Model API" ] }, { "cell_type": "code", "execution_count": 4, "id": "a053d217", "metadata": { "execution": { "iopub.execute_input": "2026-07-01T08:29:37.546158Z", "iopub.status.busy": "2026-07-01T08:29:37.545928Z", "iopub.status.idle": "2026-07-01T08:29:37.580587Z", "shell.execute_reply": "2026-07-01T08:29:37.580007Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grid_model = stormvogel.model.new_mdp(create_initial_state=False)\n", "\n", "for x in range(N):\n", " for y in range(N):\n", " labels = [f\"({x},{y})\"]\n", " if x == 0 and y == 0:\n", " labels.append(\"init\")\n", " grid_model.new_state(labels)\n", "\n", "for x in range(N):\n", " for y in range(N):\n", " state_tile_label = str((x, y)).replace(\" \", \"\")\n", " state = next(iter(grid_model.get_states_with_label(state_tile_label)))\n", " av = available_actions((x, y))\n", " for a in av:\n", " target_tile_label = str(pairwise_plus((x, y), ACTION_SEMANTICS[a])).replace(\n", " \" \", \"\"\n", " )\n", " target_state = next(\n", " iter(grid_model.get_states_with_label(target_tile_label))\n", " )\n", " state.add_choices([(grid_model.action(a), target_state)])\n", "show(grid_model)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.14" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "09975e3519a74bdf9312ca13866db8dd": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_e4fe5229d62941f0808a84d416c4f8df", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "14d7f3dd3b0d4f72a68b240a5d28fe64": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "30eeeb9e025b4aa38ddfee8c346ebfbe": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_863bf3ad34ae4d8cad405b0081442997", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "338e49574bc340919f34044916fa044e": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_eeb66113385b443ca5dc43e941996501", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "45eb002b53f14dad8460da8349ac9376": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "50a0382519c145f9998bb47ceeb64f06": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_45eb002b53f14dad8460da8349ac9376", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "863bf3ad34ae4d8cad405b0081442997": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "975a35bc17f046ee82a83dd9cc44809b": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_c9342a1ca6c44b9e8d6d47ce06247642", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "9e5ac677d69e4f53bf6468422b07473b": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "ac354e2b27f34f8f9a4c24e225950cf5": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_9e5ac677d69e4f53bf6468422b07473b", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "c0e4f4ba116b4a06a6428bb125e26efb": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_14d7f3dd3b0d4f72a68b240a5d28fe64", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "c9342a1ca6c44b9e8d6d47ce06247642": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "d8344e1d9f164e54937a7a3513c6e000": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "e4fe5229d62941f0808a84d416c4f8df": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "eeb66113385b443ca5dc43e941996501": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "f90bcc9958d34d949ee64db148201e9c": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_d8344e1d9f164e54937a7a3513c6e000", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }