{ "cells": [ { "cell_type": "markdown", "id": "1a0bd534", "metadata": {}, "source": [ "# Building MDPs\n", "In Stormvogel, a **Markov Decision Process (MDP)** consists of:\n", "* states $S$,\n", "* actions $A$,\n", "* an initial state $s_0$,\n", "* a mapping from states to sets of *enabled actions*,\n", "* a successor distribution $P(s,a)$ for every state $s$ and every enabled action $a$, i.e., sets of transitions between states $s$ and $s'$, each annotated with an action and a probability.\n", "* state labels $L(s)$.\n", "\n", "\n", "Here we show how to construct a simple example mdp using the bird API and the model builder API.\n", "The idea is that you can choose to study (you will likely pass the exam but you have less free time) or not to study (you will have more free time but risk failing the exam)." ] }, { "cell_type": "markdown", "id": "ac1f227e", "metadata": {}, "source": [ "## The study dilemma\n", "This little MDP is supposed to help you decide whether you should stuy or not." ] }, { "cell_type": "markdown", "id": "6dc132dc", "metadata": {}, "source": [ "### Bird API\n", "For MDPs, you specify the availaible actions in `available_actions`. An action here is simply a string. You specify the transition of a state-action pair in `delta`." ] }, { "cell_type": "code", "execution_count": 1, "id": "1e654ae8", "metadata": { "execution": { "iopub.execute_input": "2026-04-16T05:27:04.288706Z", "iopub.status.busy": "2026-04-16T05:27:04.288528Z", "iopub.status.idle": "2026-04-16T05:27:04.723946Z", "shell.execute_reply": "2026-04-16T05:27:04.723338Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stormvogel import *\n", "\n", "\n", "def available_actions(s):\n", " if s == \"init\":\n", " return [\"study\", \"don't study\"]\n", " else:\n", " return [\"\"]\n", "\n", "\n", "def delta(s, a):\n", " if a == \"study\":\n", " return [(1, \"did study\")]\n", " elif a == \"don't study\":\n", " return [(1, \"did not study\")]\n", " elif s == \"did study\":\n", " return [(9 / 10, \"pass test\"), (1 / 10, \"fail test\")]\n", " elif s == \"did not study\":\n", " return [(2 / 5, \"pass test\"), (3 / 5, \"fail test\")]\n", " else:\n", " return [(1, \"end\")]\n", "\n", "\n", "def labels(s):\n", " return s\n", "\n", "\n", "# For rewards, you have to provide a dict. This enables multiple reward models if you use a non-singleton list.\n", "def rewards(s: bird.State):\n", " if s == \"pass test\":\n", " return {\"R\": 100}\n", " elif s == \"did not study\":\n", " return {\"R\": 15}\n", " else:\n", " return {\"R\": 0}\n", "\n", "\n", "bird_study = bird.build_bird(\n", " delta=delta,\n", " init=\"init\",\n", " available_actions=available_actions,\n", " labels=labels,\n", " modeltype=ModelType.MDP,\n", " rewards=rewards,\n", ")\n", "show(bird_study)" ] }, { "cell_type": "markdown", "id": "c94affa1", "metadata": {}, "source": [ "### Model API" ] }, { "cell_type": "code", "execution_count": 2, "id": "579dbd37", "metadata": { "execution": { "iopub.execute_input": "2026-04-16T05:27:04.734668Z", "iopub.status.busy": "2026-04-16T05:27:04.734311Z", "iopub.status.idle": "2026-04-16T05:27:04.767995Z", "shell.execute_reply": "2026-04-16T05:27:04.767400Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stormvogel import *\n", "\n", "mdp = stormvogel.model.new_mdp()\n", "\n", "init = mdp.initial_state\n", "study = mdp.action(\"study\")\n", "not_study = mdp.action(\"don't study\")\n", "\n", "did_study = mdp.new_state(\"did study\")\n", "did_not_study = mdp.new_state(\"did not study\")\n", "\n", "pass_test = mdp.new_state(\"pass test\")\n", "fail_test = mdp.new_state(\"fail test\")\n", "end = mdp.new_state(\"end\")\n", "\n", "init.set_choices(\n", " {\n", " study: [(1, did_study)],\n", " not_study: [(1, did_not_study)],\n", " }\n", ")\n", "\n", "did_study.set_choices([(9 / 10, pass_test), (1 / 10, fail_test)])\n", "did_not_study.set_choices([(4 / 10, pass_test), (6 / 10, fail_test)])\n", "\n", "pass_test.set_choices([(1, end)])\n", "fail_test.set_choices([(1, end)])\n", "\n", "mdp.add_self_loops()\n", "\n", "reward_model = mdp.new_reward_model(\"R\")\n", "reward_model.set_state_reward(pass_test, 100)\n", "reward_model.set_state_reward(fail_test, 0)\n", "reward_model.set_state_reward(did_not_study, 15)\n", "reward_model.set_unset_rewards(0)\n", "\n", "show(mdp)" ] }, { "cell_type": "markdown", "id": "f852533e", "metadata": {}, "source": [ "## Grid model\n", "An MDP model that consists of a 3x3 grid. The direction to walk is chosen by an action." ] }, { "cell_type": "markdown", "id": "a03bdcbe", "metadata": {}, "source": [ "### Bird API" ] }, { "cell_type": "code", "execution_count": 3, "id": "40d54f63", "metadata": { "execution": { "iopub.execute_input": "2026-04-16T05:27:04.776063Z", "iopub.status.busy": "2026-04-16T05:27:04.775858Z", "iopub.status.idle": "2026-04-16T05:27:04.809372Z", "shell.execute_reply": "2026-04-16T05:27:04.808821Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N = 3\n", "\n", "ACTION_SEMANTICS = {\"l\": (-1, 0), \"r\": (1, 0), \"u\": (0, -1), \"d\": (0, 1)}\n", "\n", "\n", "def available_actions(s):\n", " res = []\n", " if s[0] > 0:\n", " res.append(\"l\")\n", " if s[0] < N - 1:\n", " res.append(\"r\")\n", " if s[1] > 0:\n", " res.append(\"u\")\n", " if s[1] < N - 1:\n", " res.append(\"d\")\n", " return res\n", "\n", "\n", "def pairwise_plus(t1, t2):\n", " return (t1[0] + t2[0], t1[1] + t2[1])\n", "\n", "\n", "def delta(s, a):\n", " return [(1, pairwise_plus(s, ACTION_SEMANTICS[a]))]\n", "\n", "\n", "def labels(s):\n", " return [str(s)]\n", "\n", "\n", "m1 = bird.build_bird(\n", " init=(0, 0), available_actions=available_actions, labels=labels, delta=delta\n", ")\n", "show(m1)" ] }, { "cell_type": "markdown", "id": "c1bca730", "metadata": {}, "source": [ "### Model API" ] }, { "cell_type": "code", "execution_count": 4, "id": "efcb3b22", "metadata": { "execution": { "iopub.execute_input": "2026-04-16T05:27:04.817908Z", "iopub.status.busy": "2026-04-16T05:27:04.817701Z", "iopub.status.idle": "2026-04-16T05:27:04.850876Z", "shell.execute_reply": "2026-04-16T05:27:04.850238Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grid_model = stormvogel.model.new_mdp(create_initial_state=False)\n", "\n", "for x in range(N):\n", " for y in range(N):\n", " labels = [f\"({x},{y})\"]\n", " if x == 0 and y == 0:\n", " labels.append(\"init\")\n", " grid_model.new_state(labels)\n", "\n", "for x in range(N):\n", " for y in range(N):\n", " state_tile_label = str((x, y)).replace(\" \", \"\")\n", " state = next(iter(grid_model.get_states_with_label(state_tile_label)))\n", " av = available_actions((x, y))\n", " for a in av:\n", " target_tile_label = str(pairwise_plus((x, y), ACTION_SEMANTICS[a])).replace(\n", " \" \", \"\"\n", " )\n", " target_state = next(\n", " iter(grid_model.get_states_with_label(target_tile_label))\n", " )\n", " state.add_choices([(grid_model.action(a), target_state)])\n", "show(grid_model)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.13" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "0c75d0112df449c19b09596e64c689fa": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_960949f4649e459f85f8042d80282f1e", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "1887a4e1d9b94dd3a7c0fd3d391a1e10": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_a7807d877ddc4cdbae933f6a8f3285b5", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "225940740b014e79aaed944b73e1c7da": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_e4bfff3ed7b84ae7b050fc3cb0e8d46a", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "4655a31f79eb4a2bbff4bb74de76b22b": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_59f6486d07a1466cbd3e7b0b33a382ae", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "4f0bb887184d4f6b8cadfe8b3662d4ad": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "591c3b4b80e24bb6966feefd31663bc0": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "59f6486d07a1466cbd3e7b0b33a382ae": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "7ee7141c77824911803b2b1d0c643e6a": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_4f0bb887184d4f6b8cadfe8b3662d4ad", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "936ccfe03b0941c0815643ca74a26cac": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "960949f4649e459f85f8042d80282f1e": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "a07679e8c7ec49a2857a0ce5d3bde6aa": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_aea615bbd7fe4b828855ed1fbac1a528", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "a7807d877ddc4cdbae933f6a8f3285b5": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "aea615bbd7fe4b828855ed1fbac1a528": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "af61d941cff643939565620bfdb0433e": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_936ccfe03b0941c0815643ca74a26cac", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "e4bfff3ed7b84ae7b050fc3cb0e8d46a": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "e770eeb598664e1c87f34e92c6731428": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_591c3b4b80e24bb6966feefd31663bc0", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }