{ "cells": [ { "cell_type": "markdown", "id": "74a94a47", "metadata": {}, "source": [ "# Gymnasium integration\n", "[Gymnasium](https://gymnasium.farama.org/) is an API standard for reinforcement learning with a diverse collection of reference environments. Stormvogel supports some integration with gymnasium. In particular, you can construct explicit models from the gymnasium environmnents under Gymnasium's [ToyText](https://gymnasium.farama.org/environments/toy_text/) (except Blackjack)." ] }, { "cell_type": "markdown", "id": "72cff06c", "metadata": {}, "source": [ "## FrozenLake\n", "Let us create one of these environments, called FrozenLake.\n", "Our agent wants to get to the present. Currently, it just chooses a random action." ] }, { "cell_type": "code", "execution_count": 1, "id": "22c4643c", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:48:07.790656Z", "iopub.status.busy": "2026-03-26T10:48:07.790420Z", "iopub.status.idle": "2026-03-26T10:48:08.214081Z", "shell.execute_reply": "2026-03-26T10:48:08.213495Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/__w/stormvogel/stormvogel/.venv/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.\n", " from pkg_resources import resource_stream, resource_exists\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import gymnasium as gym\n", "from stormvogel.extensions.gym_grid import *\n", "from stormvogel import *\n", "\n", "env = gym.make(\n", " \"FrozenLake-v1\", render_mode=\"rgb_array\", is_slippery=False\n", ") # Set `is_slippery=True` for stochastic behavior\n", "filename = gymnasium_render_model_gif(env, filename=\"ice1\")\n", "extensions.embed_gif(filename)" ] }, { "cell_type": "markdown", "id": "1319f4fb", "metadata": {}, "source": [ "We can convert it into an explicit MDP as follows. Each state has a label that relates to the coordinates of the tile." ] }, { "cell_type": "code", "execution_count": 2, "id": "35aa00f9", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:48:08.215956Z", "iopub.status.busy": "2026-03-26T10:48:08.215727Z", "iopub.status.idle": "2026-03-26T10:48:08.400455Z", "shell.execute_reply": "2026-03-26T10:48:08.399885Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sv_model = gymnasium_grid_to_stormvogel(env, GRID_ACTION_LABEL_MAP)\n", "# GRID_ACTION_LABEL_MAP = {0: \"←\", 1: \"↓\", 2: \"→\", 3: \"↑\", 4: \"pickup\", 5: \"dropoff\"} (map between gymnasium action ids and labels)\n", "vis = show(sv_model, layout=Layout(\"layouts/frozenlake.json\"))" ] }, { "cell_type": "markdown", "id": "27def322", "metadata": {}, "source": [ "Now, let's do some model checking to calculate a strategy to solve the puzzle. We will tell the model checker to maximize the probability of getting to the target state (the present)." ] }, { "cell_type": "code", "execution_count": 3, "id": "aca167ce", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:48:08.407750Z", "iopub.status.busy": "2026-03-26T10:48:08.407447Z", "iopub.status.idle": "2026-03-26T10:48:08.441734Z", "shell.execute_reply": "2026-03-26T10:48:08.441150Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "res = model_checking(sv_model, 'Pmax=? [F \"target\"]')\n", "vis2 = show(sv_model, result=res, layout=Layout(\"layouts/frozenlake.json\"))" ] }, { "cell_type": "markdown", "id": "81862769", "metadata": {}, "source": [ "Let's highlight the path to see what the scheduler is doing." ] }, { "cell_type": "code", "execution_count": 4, "id": "48809076", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:48:08.448556Z", "iopub.status.busy": "2026-03-26T10:48:08.448383Z", "iopub.status.idle": "2026-03-26T10:48:21.494583Z", "shell.execute_reply": "2026-03-26T10:48:21.493995Z" } }, "outputs": [], "source": [ "path = simulate_path(sv_model, scheduler=res.scheduler, steps=20)\n", "vis2.highlight_path(path, color=\"orange\")" ] }, { "cell_type": "markdown", "id": "b817a32d", "metadata": {}, "source": [ "Alternatively, we can show what our scheduler does in the frozen lake environment itself." ] }, { "cell_type": "code", "execution_count": 5, "id": "4f40ffaa", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:48:21.496419Z", "iopub.status.busy": "2026-03-26T10:48:21.496228Z", "iopub.status.idle": "2026-03-26T10:48:21.542383Z", "shell.execute_reply": "2026-03-26T10:48:21.541927Z" } }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from stormvogel.extensions.gym_grid import *\n", "\n", "gs = to_gymnasium_scheduler(sv_model, res.scheduler, GRID_ACTION_LABEL_MAP)\n", "filename = gymnasium_render_model_gif(env, gs, filename=\"ice2\")\n", "extensions.embed_gif(filename)" ] }, { "cell_type": "markdown", "id": "0fcb9021", "metadata": {}, "source": [ "We can also define a function to act as the scheduler on the model and convert it to a gymnasium scheduler. This one just keeps going in a loop..." ] }, { "cell_type": "code", "execution_count": 6, "id": "ada5c535", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:48:21.544099Z", "iopub.status.busy": "2026-03-26T10:48:21.543920Z", "iopub.status.idle": "2026-03-26T10:48:21.848022Z", "shell.execute_reply": "2026-03-26T10:48:21.847521Z" } }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from stormvogel.model import Action\n", "\n", "\n", "def my_scheduler(s: stormvogel.model.State):\n", " # \"←\" \"↓\" \"→\" \"↑\"\n", " if s.is_initial():\n", " return Action(\"→\")\n", " env_id = int(s.labels[0])\n", " x, y = to_coordinate(env_id, env)\n", " if x < 2 and y == 0:\n", " return Action(\"→\")\n", " elif x == 2 and y < 2:\n", " return Action(\"↓\")\n", " elif x > 0 and y == 2:\n", " return Action(\"←\")\n", " else:\n", " return Action(\"↑\")\n", "\n", "\n", "gs = to_gymnasium_scheduler(sv_model, my_scheduler, GRID_ACTION_LABEL_MAP)\n", "filename = gymnasium_render_model_gif(env, gs, filename=\"ice3\")\n", "extensions.embed_gif(filename)" ] }, { "cell_type": "markdown", "id": "84a92ebf", "metadata": {}, "source": [ "## CliffWalking\n", "CliffWalking is a slightly more boring version of FrozenLake. You can apply the same principles that we just applied to FrozenLake." ] }, { "cell_type": "code", "execution_count": 7, "id": "75433916", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:48:21.849740Z", "iopub.status.busy": "2026-03-26T10:48:21.849576Z", "iopub.status.idle": "2026-03-26T10:48:21.855230Z", "shell.execute_reply": "2026-03-26T10:48:21.854692Z" } }, "outputs": [], "source": [ "import gymnasium as gym\n", "from stormvogel.extensions.gym_grid import *\n", "\n", "env = gym.make(\"CliffWalking-v1\", render_mode=\"rgb_array\")" ] }, { "cell_type": "code", "execution_count": 8, "id": "869f43bb", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:48:21.857048Z", "iopub.status.busy": "2026-03-26T10:48:21.856878Z", "iopub.status.idle": "2026-03-26T10:48:21.890084Z", "shell.execute_reply": "2026-03-26T10:48:21.889502Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " Network\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sv_model = gymnasium_grid_to_stormvogel(env, GRID_ACTION_LABEL_MAP)\n", "vis = show(sv_model, layout=Layout(\"layouts/cliffwalking.json\"))" ] }, { "cell_type": "code", "execution_count": 9, "id": "f8480818", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:48:21.897103Z", "iopub.status.busy": "2026-03-26T10:48:21.896877Z", "iopub.status.idle": "2026-03-26T10:48:22.087202Z", "shell.execute_reply": "2026-03-26T10:48:22.086629Z" } }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from stormvogel.stormpy_utils.model_checking import model_checking\n", "\n", "res = model_checking(sv_model, 'Pmax=? [F \"target\"]')\n", "gs = to_gymnasium_scheduler(sv_model, res.scheduler, GRID_ACTION_LABEL_MAP)\n", "filename = gymnasium_render_model_gif(env, gs, filename=\"cliff\")\n", "extensions.embed_gif(filename)" ] }, { "cell_type": "markdown", "id": "800050ae", "metadata": {}, "source": [ "## Taxi\n", "In the Taxi scenario, a taxi has to pick up passengers and transport them to the hotel. The position of the target, passenger and taxi are chosen at random." ] }, { "cell_type": "code", "execution_count": 10, "id": "e7199322", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:48:22.089092Z", "iopub.status.busy": "2026-03-26T10:48:22.088916Z", "iopub.status.idle": "2026-03-26T10:48:22.149079Z", "shell.execute_reply": "2026-03-26T10:48:22.148632Z" } }, "outputs": [ { "data": { "text/plain": [ "'ModelType.MDP model with 505 states, 7 actions, and 1005 distinct labels.'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import gymnasium as gym\n", "from stormvogel.extensions.gym_grid import *\n", "\n", "env = gym.make(\n", " \"Taxi-v3\", render_mode=\"rgb_array\"\n", ") # Set `is_slippery=True` for stochastic behavior\n", "sv_model = gymnasium_grid_to_stormvogel(env)\n", "# This model is so big that it is better not to display it.\n", "sv_model.summary()" ] }, { "cell_type": "code", "execution_count": 11, "id": "030143e3", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:48:22.150677Z", "iopub.status.busy": "2026-03-26T10:48:22.150503Z", "iopub.status.idle": "2026-03-26T10:48:22.772124Z", "shell.execute_reply": "2026-03-26T10:48:22.771547Z" } }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "target = get_target_state(env)\n", "res = model_checking(sv_model, \"Rmax=? [S]\")\n", "gs = to_gymnasium_scheduler(sv_model, res.scheduler, GRID_ACTION_LABEL_MAP)\n", "filename = gymnasium_render_model_gif(env, gs, filename=\"taxi\")\n", "extensions.embed_gif(filename)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.13" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "3546aae48b2f4ae8aad90095c64d8254": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "4cac9c4c4f154f94812128edcdaed5c9": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_87598f16b83541319a9902a6288d6542", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "4de3fda5a2374b3cad5a1c7a3777e542": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "69066fd54c6344938fd026e003e03fa3": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "86eeed5a91924af9896dac365c6ad24a": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "87598f16b83541319a9902a6288d6542": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "95e8e0ca02414abe96373bd43fb7d08b": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_4de3fda5a2374b3cad5a1c7a3777e542", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "9fb8d5e600fb4b4cbab29140b5b6f217": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "b7fc753dd7a04cd6961e113aeeb348b2": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_69066fd54c6344938fd026e003e03fa3", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "b91fad11be354c8db67eb0678741af32": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_9fb8d5e600fb4b4cbab29140b5b6f217", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "f0d9fa3a894b40deaae0e3600b24d2a4": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_3546aae48b2f4ae8aad90095c64d8254", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } }, "f46d220525704e4fa6ecc548fdf7a176": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "_model_name": "OutputModel", "_view_count": null, "_view_module": "@jupyter-widgets/output", "_view_module_version": "1.0.0", "_view_name": "OutputView", "layout": "IPY_MODEL_86eeed5a91924af9896dac365c6ad24a", "msg_id": "", "outputs": [], "tabbable": null, "tooltip": null } } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }