{ "cells": [ { "cell_type": "markdown", "id": "0d93a53e", "metadata": {}, "source": [ "# Building MDPs\n", "In Stormvogel, a **Markov Decision Process (MDP)** consists of:\n", "* states $S$,\n", "* actions $A$,\n", "* an initial state $s_0$,\n", "* a mapping from states to sets of *enabled actions*,\n", "* a successor distribution $P(s,a)$ for every state $s$ and every enabled action $a$, i.e., sets of transitions between states $s$ and $s'$, each annotated with an action and a probability.\n", "* state labels $L(s)$.\n", "\n", "\n", "Here we show how to construct a simple example mdp using the bird API and the model builder API.\n", "The idea is that you can choose to study (you will likely pass the exam but you have less free time) or not to study (you will have more free time but risk failing the exam)." ] }, { "cell_type": "markdown", "id": "bece1df6", "metadata": {}, "source": [ "## The study dilemma\n", "This little MDP is supposed to help you decide whether you should stuy or not." ] }, { "cell_type": "markdown", "id": "ef9dec3d", "metadata": {}, "source": [ "### Bird API\n", "For MDPs, you specify the availaible actions in `available_actions`. An action here is simply a string. You specify the transition of a state-action pair in `delta`." ] }, { "cell_type": "code", "execution_count": 1, "id": "5c5ff61c", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:41:27.435671Z", "iopub.status.busy": "2026-03-26T10:41:27.435489Z", "iopub.status.idle": "2026-03-26T10:41:27.903705Z", "shell.execute_reply": "2026-03-26T10:41:27.903062Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stormvogel import *\n", "\n", "\n", "def available_actions(s):\n", " if s == \"init\":\n", " return [\"study\", \"don't study\"]\n", " else:\n", " return [\"\"]\n", "\n", "\n", "def delta(s, a):\n", " if a == \"study\":\n", " return [(1, \"did study\")]\n", " elif a == \"don't study\":\n", " return [(1, \"did not study\")]\n", " elif s == \"did study\":\n", " return [(9 / 10, \"pass test\"), (1 / 10, \"fail test\")]\n", " elif s == \"did not study\":\n", " return [(2 / 5, \"pass test\"), (3 / 5, \"fail test\")]\n", " else:\n", " return [(1, \"end\")]\n", "\n", "\n", "def labels(s):\n", " return s\n", "\n", "\n", "# For rewards, you have to provide a dict. This enables multiple reward models if you use a non-singleton list.\n", "def rewards(s: bird.State):\n", " if s == \"pass test\":\n", " return {\"R\": 100}\n", " elif s == \"did not study\":\n", " return {\"R\": 15}\n", " else:\n", " return {\"R\": 0}\n", "\n", "\n", "bird_study = bird.build_bird(\n", " delta=delta,\n", " init=\"init\",\n", " available_actions=available_actions,\n", " labels=labels,\n", " modeltype=ModelType.MDP,\n", " rewards=rewards,\n", ")\n", "show(bird_study)" ] }, { "cell_type": "markdown", "id": "a4943472", "metadata": {}, "source": [ "### Model API" ] }, { "cell_type": "code", "execution_count": 2, "id": "d2bd51e5", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:41:27.910820Z", "iopub.status.busy": "2026-03-26T10:41:27.910499Z", "iopub.status.idle": "2026-03-26T10:41:27.943510Z", "shell.execute_reply": "2026-03-26T10:41:27.942617Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stormvogel import *\n", "\n", "mdp = stormvogel.model.new_mdp()\n", "\n", "init = mdp.initial_state\n", "study = mdp.action(\"study\")\n", "not_study = mdp.action(\"don't study\")\n", "\n", "did_study = mdp.new_state(\"did study\")\n", "did_not_study = mdp.new_state(\"did not study\")\n", "\n", "pass_test = mdp.new_state(\"pass test\")\n", "fail_test = mdp.new_state(\"fail test\")\n", "end = mdp.new_state(\"end\")\n", "\n", "init.set_choices(\n", " {\n", " study: [(1, did_study)],\n", " not_study: [(1, did_not_study)],\n", " }\n", ")\n", "\n", "did_study.set_choices([(9 / 10, pass_test), (1 / 10, fail_test)])\n", "did_not_study.set_choices([(4 / 10, pass_test), (6 / 10, fail_test)])\n", "\n", "pass_test.set_choices([(1, end)])\n", "fail_test.set_choices([(1, end)])\n", "\n", "mdp.add_self_loops()\n", "\n", "reward_model = mdp.new_reward_model(\"R\")\n", "reward_model.set_state_reward(pass_test, 100)\n", "reward_model.set_state_reward(fail_test, 0)\n", "reward_model.set_state_reward(did_not_study, 15)\n", "reward_model.set_unset_rewards(0)\n", "\n", "show(mdp)" ] }, { "cell_type": "markdown", "id": "2af85d8a", "metadata": {}, "source": [ "## Grid model\n", "An MDP model that consists of a 3x3 grid. The direction to walk is chosen by an action." ] }, { "cell_type": "markdown", "id": "76b428ce", "metadata": {}, "source": [ "### Bird API" ] }, { "cell_type": "code", "execution_count": 3, "id": "66ee582c", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:41:27.951645Z", "iopub.status.busy": "2026-03-26T10:41:27.951411Z", "iopub.status.idle": "2026-03-26T10:41:27.985664Z", "shell.execute_reply": "2026-03-26T10:41:27.985060Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N = 3\n", "\n", "ACTION_SEMANTICS = {\"l\": (-1, 0), \"r\": (1, 0), \"u\": (0, -1), \"d\": (0, 1)}\n", "\n", "\n", "def available_actions(s):\n", " res = []\n", " if s[0] > 0:\n", " res.append(\"l\")\n", " if s[0] < N - 1:\n", " res.append(\"r\")\n", " if s[1] > 0:\n", " res.append(\"u\")\n", " if s[1] < N - 1:\n", " res.append(\"d\")\n", " return res\n", "\n", "\n", "def pairwise_plus(t1, t2):\n", " return (t1[0] + t2[0], t1[1] + t2[1])\n", "\n", "\n", "def delta(s, a):\n", " return [(1, pairwise_plus(s, ACTION_SEMANTICS[a]))]\n", "\n", "\n", "def labels(s):\n", " return [str(s)]\n", "\n", "\n", "m1 = bird.build_bird(\n", " init=(0, 0), available_actions=available_actions, labels=labels, delta=delta\n", ")\n", "show(m1)" ] }, { "cell_type": "markdown", "id": "6000394e", "metadata": {}, "source": [ "### Model API" ] }, { "cell_type": "code", "execution_count": 4, "id": "0323c35f", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:41:27.992816Z", "iopub.status.busy": "2026-03-26T10:41:27.992587Z", "iopub.status.idle": "2026-03-26T10:41:28.026495Z", "shell.execute_reply": "2026-03-26T10:41:28.025889Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grid_model = stormvogel.model.new_mdp(create_initial_state=False)\n", "\n", "for x in range(N):\n", " for y in range(N):\n", " labels = [f\"({x},{y})\"]\n", " if x == 0 and y == 0:\n", " labels.append(\"init\")\n", " grid_model.new_state(labels)\n", "\n", "for x in range(N):\n", " for y in range(N):\n", " state_tile_label = str((x, y)).replace(\" \", \"\")\n", " state = next(iter(grid_model.get_states_with_label(state_tile_label)))\n", " av = available_actions((x, y))\n", " for a in av:\n", " target_tile_label = str(pairwise_plus((x, y), ACTION_SEMANTICS[a])).replace(\n", " \" \", \"\"\n", " )\n", " target_state = next(\n", " iter(grid_model.get_states_with_label(target_tile_label))\n", " )\n", " state.add_choices([(grid_model.action(a), target_state)])\n", "show(grid_model)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.13" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "04ff1a661d294973bc31cacb97566bc4": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "0d826514f7bb4d498c532a6e1356cde1": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_b4ba66fb55b744c1aa0a2e6df2a38d7f", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "129ec252d244413d9b7d308aabf735ba": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_da0863f14cf3458c9caec560807535c7", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "132a9d63fdd44e6e8204f3afe0118860": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "211573623e4e45169acad171a5f1bc25": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "4f18ecfd9a0a42d3b12cc6246a56e285": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_132a9d63fdd44e6e8204f3afe0118860", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "719dc147081d485ea7083beff830ec6a": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_e85c37fccd42482192a87752ca378c12", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "7a53bc39a38e4500ae9946e430970bd9": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_211573623e4e45169acad171a5f1bc25", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "7d4b5e103be84fd2ba6ccc47b20571f9": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "b4ba66fb55b744c1aa0a2e6df2a38d7f": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "b5c0c30715554f8bb356cc80bebd938a": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_04ff1a661d294973bc31cacb97566bc4", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "c0c2357b514e4992a97b6b692541a1dc": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "c7edc2ebbccd4b5e9ffa4775c3fc528f": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_c0c2357b514e4992a97b6b692541a1dc", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "d12e0e3cb5fa4bafa73bc0619c550cd8": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_7d4b5e103be84fd2ba6ccc47b20571f9", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "da0863f14cf3458c9caec560807535c7": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "e85c37fccd42482192a87752ca378c12": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }