{ "cells": [ { "cell_type": "markdown", "id": "6d003566", "metadata": {}, "source": [ "# Step Four: MDPs with More Uncertainty\n", "While MDPs are the standard model to describe processes that are subject to\n", "both nondeterminism and probabilistic uncertainty, not every process can adequately be represented as an MDP. We briefly show how Storm can go beyond the verification of MDPs by means of **imprecise probabilities** and **partial observations**, while noting that support for such extensions is still actively developed." ] }, { "cell_type": "markdown", "id": "65d7aa24", "metadata": {}, "source": [ "## Preparation\n", "We load the general function of the Orchard Stormvogel model. These were defined in *Step One*." ] }, { "cell_type": "code", "execution_count": 1, "id": "acc42834", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:54.719625Z", "iopub.status.busy": "2026-03-26T10:40:54.719460Z", "iopub.status.idle": "2026-03-26T10:40:54.966259Z", "shell.execute_reply": "2026-03-26T10:40:54.965744Z" }, "lines_to_next_cell": 2 }, "outputs": [], "source": [ "# Import prism model from the previous step\n", "import stormvogel\n", "import stormpy\n", "\n", "from copy import deepcopy\n", "from orchard.orchard_game_stormvogel import (\n", " Orchard,\n", " available_actions,\n", " delta,\n", " labels,\n", " rewards,\n", " Fruit,\n", " GameState,\n", " DiceOutcome,\n", ")\n", "from orchard.orchard_builder import build_simple" ] }, { "cell_type": "markdown", "id": "92f0bc4a", "metadata": {}, "source": [ "## Imprecise Probabilities\n", "Transition probabilities in MDPs are often an abstraction of more complicated\n", "processes, such as dice rolls in the Orchard game. Instead of precise point estimates for probabilities, Storm supports interval estimates in two flavors: **interval MDPs** and **parametric MDPs**." ] }, { "cell_type": "markdown", "id": "60af8d9f", "metadata": {}, "source": [ "### Interval MDP\n", "Interval MDPs (iMDPs) replace the precise probabilities in MDPs with intervals of possible probabilities.\n", "For the sake of this tutorial, one can consider interval MDPs as a set of MDPs, covering all contained point\n", "estimates that yield proper successor distributions." ] }, { "cell_type": "markdown", "id": "36dbc08c", "metadata": { "lines_to_next_cell": 2 }, "source": [ "#### Adapted delta function\n", "In order to model an interval MDP, we replace the dice roll probabilities from the point estimates $\\frac{1}{6}$ to the intervals $[\\frac{5}{36}, \\frac{7}{36}]$.\n", "This can be done by changing all occurrences of `fair_dice_prob` in the `delta` function with the interval `Interval(fair_dice_prob-(1/36), fair_dice_prob+(1/36))`. We indicate the change by the comment `NEW` in the following.\n", "No further change is needed, and the rest of the delta function remains unmodified." ] }, { "cell_type": "code", "execution_count": 2, "id": "9ed06a46", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:54.968689Z", "iopub.status.busy": "2026-03-26T10:40:54.968418Z", "iopub.status.idle": "2026-03-26T10:40:54.974284Z", "shell.execute_reply": "2026-03-26T10:40:54.973819Z" } }, "outputs": [], "source": [ "# The transition function\n", "def delta(state, action):\n", " if state.game_state() != GameState.NOT_ENDED:\n", " # Game has ended -> self loop\n", " return [(1, state)]\n", "\n", " if state.dice is None:\n", " # Player throws dice and considers outcomes\n", " outcomes = []\n", " # Probability of fair dice throw over\n", " # each fruit type + 1 basket + 1 raven\n", " fair_dice_prob = 1 / (len(state.trees.keys()) + 2)\n", " # NEW: adapted probability\n", " fair_dice_prob = stormvogel.model.Interval(\n", " fair_dice_prob - (1 / 36), fair_dice_prob + (1 / 36)\n", " )\n", "\n", " # 1. Dice shows fruit\n", " for fruit in state.trees.keys():\n", " next_state = deepcopy(state)\n", " next_state.dice = DiceOutcome.FRUIT, fruit\n", " outcomes.append((fair_dice_prob, next_state))\n", "\n", " # 2. Dice shows basket\n", " next_state = deepcopy(state)\n", " next_state.dice = DiceOutcome.BASKET, None\n", " outcomes.append((fair_dice_prob, next_state))\n", "\n", " # 3. Dice shows raven\n", " next_state = deepcopy(state)\n", " next_state.dice = DiceOutcome.RAVEN, None\n", " outcomes.append((fair_dice_prob, next_state))\n", " return outcomes\n", "\n", " elif state.dice[0] == DiceOutcome.FRUIT:\n", " # Player picks specified fruit\n", " fruit = state.dice[1]\n", " next_state = deepcopy(state)\n", " next_state.pick_fruit(fruit)\n", " next_state.next_round()\n", " return [(1, next_state)]\n", "\n", " elif state.dice[0] == DiceOutcome.BASKET:\n", " assert action.startswith(\"choose\")\n", " # Player chooses fruit specified by action\n", " fruit = Fruit[action.removeprefix(\"choose\")]\n", " next_state = deepcopy(state)\n", " next_state.pick_fruit(fruit)\n", " next_state.next_round()\n", " return [(1, next_state)]\n", "\n", " elif state.dice[0] == DiceOutcome.RAVEN:\n", " next_state = deepcopy(state)\n", " next_state.move_raven()\n", " next_state.next_round()\n", " return [(1, next_state)]\n", "\n", " assert False" ] }, { "cell_type": "markdown", "id": "77083d1e", "metadata": {}, "source": [ "We can now build the model as before and obtain the Storm representation." ] }, { "cell_type": "code", "execution_count": 3, "id": "12023483", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:54.976070Z", "iopub.status.busy": "2026-03-26T10:40:54.975893Z", "iopub.status.idle": "2026-03-26T10:40:57.254350Z", "shell.execute_reply": "2026-03-26T10:40:57.253766Z" } }, "outputs": [], "source": [ "init_game = Orchard(\n", " [Fruit.APPLE, Fruit.CHERRY, Fruit.PEAR, Fruit.PLUM], num_fruits=4, raven_distance=5\n", ")\n", "\n", "\n", "# For the full model, we only set the relevant labels for the winning conditions\n", "# and do not expose the internal state information\n", "def labels_full(state):\n", " labels = []\n", " if state.game_state() == GameState.PLAYERS_WON:\n", " labels.append(\"PlayersWon\")\n", " elif state.game_state() == GameState.RAVEN_WON:\n", " labels.append(\"RavenWon\")\n", " return labels\n", "\n", "\n", "orchard = stormvogel.bird.build_bird(\n", " modeltype=stormvogel.ModelType.MDP,\n", " init=init_game,\n", " available_actions=available_actions,\n", " delta=delta,\n", " labels=labels_full,\n", " max_size=100000,\n", ")\n", "\n", "# Convert to stormpy model\n", "orchard_storm = stormvogel.mapping.stormvogel_to_stormpy(orchard)" ] }, { "cell_type": "markdown", "id": "aa54287e", "metadata": {}, "source": [ "Printing the model reveals that it has type IMDP now." ] }, { "cell_type": "code", "execution_count": 4, "id": "41db183e", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:57.256454Z", "iopub.status.busy": "2026-03-26T10:40:57.256236Z", "iopub.status.idle": "2026-03-26T10:40:57.259502Z", "shell.execute_reply": "2026-03-26T10:40:57.258890Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------------------------------------------------------- \n", "Model type: \tIMDP (sparse)\n", "States: \t22469\n", "Transitions: \t44949\n", "Choices: \t29349\n", "Reward Models: none\n", "State Labels: \t3 labels\n", " * PlayersWon -> 5 item(s)\n", " * RavenWon -> 624 item(s)\n", " * init -> 1 item(s)\n", "Choice Labels: \t11 labels\n", " * gameEnded -> 629 item(s)\n", " * chooseCHERRY -> 2500 item(s)\n", " * pickCHERRY -> 3120 item(s)\n", " * choosePEAR -> 2500 item(s)\n", " * nextRound -> 3120 item(s)\n", " * pickAPPLE -> 3120 item(s)\n", " * choosePLUM -> 2500 item(s)\n", " * moveRaven -> 3120 item(s)\n", " * chooseAPPLE -> 2500 item(s)\n", " * pickPEAR -> 3120 item(s)\n", " * pickPLUM -> 3120 item(s)\n", "-------------------------------------------------------------- \n", "\n" ] } ], "source": [ "print(orchard_storm)" ] }, { "cell_type": "markdown", "id": "942ce128", "metadata": {}, "source": [ "#### Analysis\n", "Uncertainty in interval MDPs can be resolved in two ways.\n", "\n", "First, in the **cooperative** or angelic interpretation, the uncertainty is in favor of the policy. Therefore, if the policy maximizes the probability to win Orchard, the dice probability will also be chosen in order to maximize the probability to win.\n", "\n", "Second, in the **robust** or demonic interpretation, the uncertainty is resolved against the policy.\n", "If the policy maximizes the probability to win, the dice probability will be chosen\n", "in order to minimize the policy’s probability to win.\n", "\n", "We show how to check an interval MDP in the following and start with the cooperative setting, followed by the robust setting." ] }, { "cell_type": "code", "execution_count": 5, "id": "628cdfd6", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:57.261229Z", "iopub.status.busy": "2026-03-26T10:40:57.261063Z", "iopub.status.idle": "2026-03-26T10:40:57.339879Z", "shell.execute_reply": "2026-03-26T10:40:57.339319Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.7960798954476785\n" ] } ], "source": [ "# Parse properties\n", "properties = stormpy.parse_properties('Pmax=? [F \"PlayersWon\"]')\n", "task = stormpy.CheckTask(properties[0].raw_formula)\n", "# Set cooperative resolution mode, alternatively: ROBUST\n", "task.set_uncertainty_resolution_mode(stormpy.UncertaintyResolutionMode.COOPERATIVE)\n", "\n", "# Prepare model checking\n", "env = stormpy.Environment()\n", "# Set VI algorithm to plain VI\n", "env.solver_environment.minmax_solver_environment.method = (\n", " stormpy.MinMaxMethod.value_iteration\n", ")\n", "# Check model\n", "stormpy_result = stormpy.check_interval_mdp(orchard_storm, task, env)\n", "print(stormpy_result.at(orchard_storm.initial_states[0]))" ] }, { "cell_type": "code", "execution_count": 6, "id": "6b1773a0", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:57.341693Z", "iopub.status.busy": "2026-03-26T10:40:57.341496Z", "iopub.status.idle": "2026-03-26T10:40:57.424161Z", "shell.execute_reply": "2026-03-26T10:40:57.423640Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.4314731874392106\n" ] } ], "source": [ "# Set robust resolution mode\n", "task.set_uncertainty_resolution_mode(stormpy.UncertaintyResolutionMode.ROBUST)\n", "# Check model\n", "stormpy_result = stormpy.check_interval_mdp(orchard_storm, task, env)\n", "print(stormpy_result.at(orchard_storm.initial_states[0]))" ] }, { "cell_type": "markdown", "id": "e1148473", "metadata": {}, "source": [ "In the cooperative setting, the player can achieve a maximal winning probability of $0.7961$, while in\n", "the robust setting, the player can only achieve a maximal winning probability of\n", "$0.4315$.\n", "It turns out that in Orchard, modifying the dice probabilities by $\\frac{1}{36} has a much larger influence on the winning probability than the player’s strategy." ] }, { "cell_type": "markdown", "id": "7cc554ac", "metadata": {}, "source": [ "### Parametric MDP\n", "The interval MDP for Orchard bounds the dice probabilities within an interval at each state.\n", "Different values may be optimal at different states, and thus, the dice probabilities may differ per state. If we want to have uncertain but fixed dice probabilities, we use a parametric MDP (pMDP).\n", "\n", "We take the simplified two-fruit variant `orchard_simple` and parametrize it as\n", "follows:\n", "the parameter `p` describes the probability to roll 🍏 or 🍐, the parameter `q` describes the probability to roll 🧺, and the probability to roll 🐦‍⬛ is described by `1−2p−q`.\n", "Similar to the iMDP, we can again replace the transition probabilities in the `delta` function with these functions over the parameters.\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "4d2670a3", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:57.426014Z", "iopub.status.busy": "2026-03-26T10:40:57.425840Z", "iopub.status.idle": "2026-03-26T10:40:57.429928Z", "shell.execute_reply": "2026-03-26T10:40:57.429303Z" } }, "outputs": [], "source": [ "# Create the correct polynomials (a parser is planned)\n", "one = stormvogel.parametric.Polynomial([])\n", "one.add_term((0,), 1)\n", "\n", "params = [f\"p{i}\" for i in range(2)]\n", "\n", "parameters = [stormvogel.parametric.Polynomial([p]) for p in params]\n", "for i in range(2):\n", " parameters[i].add_term((1,), 1)\n", "\n", "parameters.append(stormvogel.parametric.Polynomial(params))\n", "parameters[-1].add_term((0, 1), -1)\n", "parameters[-1].add_term((1, 0), -2)\n", "parameters[-1].add_term((0, 0), 1)" ] }, { "cell_type": "code", "execution_count": 8, "id": "99641dc4", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:57.431637Z", "iopub.status.busy": "2026-03-26T10:40:57.431450Z", "iopub.status.idle": "2026-03-26T10:40:57.437015Z", "shell.execute_reply": "2026-03-26T10:40:57.436543Z" } }, "outputs": [], "source": [ "# Change delta function\n", "def delta_pmc(state, action):\n", " if state.game_state() != GameState.NOT_ENDED:\n", " # Game has ended -> self loop\n", " return [(1, state)]\n", "\n", " if state.dice is None:\n", " # Player throws dice and considers outcomes\n", " outcomes = []\n", " # Total outcomes is number of fruits + 1 for basket + 1 for raven\n", " total_outcomes = len(state.trees.keys()) + 2\n", " assert total_outcomes == 4\n", "\n", " i = 0\n", " # 1. Dice shows fruit\n", " for fruit in state.trees.keys():\n", " next_state = deepcopy(state)\n", " next_state.dice = DiceOutcome.FRUIT, fruit\n", " # NEW: parametric probabilities\n", " outcomes.append((parameters[i], next_state))\n", " i += 1\n", "\n", " # 2. Dice shows basket\n", " next_state = deepcopy(state)\n", " next_state.dice = DiceOutcome.BASKET, 0\n", " # NEW: parametric probabilities\n", " outcomes.append((parameters[i], next_state))\n", "\n", " # 3. Dice shows raven\n", " next_state = deepcopy(state)\n", " next_state.dice = DiceOutcome.RAVEN, None\n", " # NEW: parametric probabilities\n", " outcomes.append((parameters[i + 1], next_state))\n", " assert len(outcomes) == total_outcomes\n", " return outcomes\n", "\n", " elif state.dice[0] == DiceOutcome.FRUIT:\n", " # Player picks specified fruit\n", " fruit = state.dice[1]\n", " next_state = deepcopy(state)\n", " next_state.pick_fruit(fruit)\n", " next_state.next_round()\n", " return [(one, next_state)]\n", "\n", " elif state.dice[0] == DiceOutcome.BASKET:\n", " assert action.startswith(\"choose\")\n", " # Player chooses fruit specified by action\n", " fruit = Fruit[action.removeprefix(\"choose\")]\n", " next_state = deepcopy(state)\n", " next_state.pick_fruit(fruit)\n", " next_state.next_round()\n", " return [(one, next_state)]\n", "\n", " elif state.dice[0] == DiceOutcome.RAVEN:\n", " next_state = deepcopy(state)\n", " next_state.move_raven()\n", " next_state.next_round()\n", " return [(one, next_state)]\n", "\n", " assert False" ] }, { "cell_type": "markdown", "id": "6eb42532", "metadata": {}, "source": [ "We can again build the simple Orchard model which is now parametric." ] }, { "cell_type": "code", "execution_count": 9, "id": "e77a1def", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:57.438657Z", "iopub.status.busy": "2026-03-26T10:40:57.438474Z", "iopub.status.idle": "2026-03-26T10:40:57.454619Z", "shell.execute_reply": "2026-03-26T10:40:57.453938Z" } }, "outputs": [], "source": [ "init_small = Orchard([Fruit.APPLE, Fruit.CHERRY], num_fruits=2, raven_distance=2)\n", "orchard_pmc = stormvogel.bird.build_bird(\n", " modeltype=stormvogel.ModelType.MDP,\n", " init=init_small,\n", " available_actions=available_actions,\n", " delta=delta_pmc,\n", " labels=labels_full,\n", " max_size=100000,\n", ")\n", "\n", "# Convert to stormpy model\n", "orchard_stormpy_pmdp = stormvogel.mapping.stormvogel_to_stormpy(orchard_pmc)" ] }, { "cell_type": "markdown", "id": "a9b06731", "metadata": {}, "source": [ "After building the model, we apply the optimal winning policy from the non-parameterized game on the pMDP and obtain the induced parametric Markov chain (pMC)." ] }, { "cell_type": "code", "execution_count": 10, "id": "9d7eddc1", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:57.456417Z", "iopub.status.busy": "2026-03-26T10:40:57.456197Z", "iopub.status.idle": "2026-03-26T10:40:57.468547Z", "shell.execute_reply": "2026-03-26T10:40:57.467928Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/__w/stormvogel/stormvogel/stormvogel/bird.py:126: UserWarning: State-action rewards are not supported in this version of stormvogel. Will assign None to the second parameter of your reward function.\n", " warnings.warn(\n" ] } ], "source": [ "# Get scheduler from MDP\n", "orchard_mdp_stormpy = stormvogel.mapping.stormvogel_to_stormpy(build_simple())\n", "result = stormpy.model_checking(\n", " orchard_mdp_stormpy, properties[0].raw_formula, extract_scheduler=True\n", ")\n", "\n", "# Apply scheduler on pMDP\n", "scheduler = result.scheduler.cast_to_parametric_datatype()\n", "orchard_stormpy_pmc = orchard_stormpy_pmdp.apply_scheduler(scheduler)" ] }, { "cell_type": "markdown", "id": "1b36f050", "metadata": {}, "source": [ "Last, we computing the winning probability on the pMC. This yields a rationnal function over parameters `p` and `q`." ] }, { "cell_type": "code", "execution_count": 11, "id": "56c850ca", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:57.470324Z", "iopub.status.busy": "2026-03-26T10:40:57.470138Z", "iopub.status.idle": "2026-03-26T10:40:57.482876Z", "shell.execute_reply": "2026-03-26T10:40:57.482414Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3*_r_1^2*_r_2^5+(-7)*_r_1*_r_2^5+24*_r_1^3*_r_2^4+45*_r_1*_r_2^4+24*_r_1^7+333*_r_1^3*_r_2^2+283*_r_1^4*_r_2+(-85)*_r_1^3*_r_2+90*_r_1^5+(-30)*_r_1^4+(-85)*_r_1^2*_r_2^2+181*_r_1^2*_r_2^3+84*_r_1^6*_r_2+114*_r_1^5*_r_2^2+(-211)*_r_1^3*_r_2^3+(-344)*_r_1^4*_r_2^2+(-268)*_r_1^5*_r_2+(-80)*_r_1^6+75*_r_1^4*_r_2^3+(-35)*_r_1*_r_2^3+(-62)*_r_1^2*_r_2^4+4*_r_2^5+(-5)*_r_2^4)/((_r_1+(-1))^3)\n" ] } ], "source": [ "# Perform model checking\n", "properties2 = stormpy.parse_properties('P=? [F \"PlayersWon\"]')\n", "stormpy_result = stormpy.model_checking(orchard_stormpy_pmc, properties2[0].raw_formula)\n", "solution_function = stormpy_result.at(0)\n", "print(solution_function)" ] }, { "cell_type": "markdown", "id": "eb7f7605", "metadata": {}, "source": [ "This rational function corresponds to the one in Fig. 5 in the tutorial paper. Parameter `r_2` is `p` and parameter `r_1` is `q`." ] }, { "cell_type": "markdown", "id": "532b7a21", "metadata": {}, "source": [ "## Partially Observable MDPs\n", "In MDP verification, we consider maximizing over all policies. In particular, the policy has\n", "perfect information about the current state, which influences the choice. In many\n", "(cyberphysical or distributed) systems, the policy should only depend on states\n", "that are known by the agent.\n", "\n", "Similarly, in the Orchard game, the state of the\n", "trees may not be visible to the players, however, we do assume that we know\n", "which types of fruit can still be picked. To model this setting faithfully, the policy\n", "should not depend on the state, but instead on a set of observations: leading to\n", "a **partially observable MDP (POMDP)**." ] }, { "cell_type": "markdown", "id": "1eeb5cdd", "metadata": {}, "source": [ "We adapted the Prism specification of the Orchard game and introduced observations.\n", "The resulting specification can be found in `orchard/orchard_pomdp.pm`.\n", "\n", "In the following, we load the POMDP." ] }, { "cell_type": "code", "execution_count": 12, "id": "8a74b412", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:57.484734Z", "iopub.status.busy": "2026-03-26T10:40:57.484544Z", "iopub.status.idle": "2026-03-26T10:40:57.605049Z", "shell.execute_reply": "2026-03-26T10:40:57.604485Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------------------------------------------------------- \n", "Model type: \tPOMDP (sparse)\n", "States: \t22469\n", "Transitions: \t44954\n", "Choices: \t29354\n", "Observations: \t546\n", "Reward Models: none\n", "State Labels: \t4 labels\n", " * deadlock -> 0 item(s)\n", " * init -> 1 item(s)\n", " * RavenWon -> 624 item(s)\n", " * PlayersWon -> 5 item(s)\n", "Choice Labels: \t11 labels\n", " * nextRound -> 3749 item(s)\n", " * pickAPPLE -> 3120 item(s)\n", " * pickPEAR -> 3120 item(s)\n", " * chooseAPPLE -> 2500 item(s)\n", " * moveRaven -> 3120 item(s)\n", " * pickCHERRY -> 3120 item(s)\n", " * pickPLUM -> 3120 item(s)\n", " * choosePEAR -> 2500 item(s)\n", " * chooseCHERRY -> 2500 item(s)\n", " * choosePLUM -> 2500 item(s)\n", " * noChoice -> 5 item(s)\n", "-------------------------------------------------------------- \n", "\n" ] } ], "source": [ "import stormpy\n", "import stormpy.pomdp\n", "\n", "prism_program = stormpy.parse_prism_program(\"orchard/orchard_pomdp.pm\")\n", "formula_str = 'Pmax=? [!\"RavenWon\" U \"PlayersWon\"]'\n", "properties = stormpy.parse_properties_for_prism_program(formula_str, prism_program)\n", "prism_program, properties = stormpy.preprocess_symbolic_input(\n", " prism_program, properties, \"\"\n", ")\n", "prism_program = prism_program.as_prism_program()\n", "options = stormpy.BuilderOptions([p.raw_formula for p in properties])\n", "options.set_build_state_valuations()\n", "options.set_build_choice_labels()\n", "pomdp = stormpy.build_model(prism_program, properties)\n", "pomdp = stormpy.pomdp.make_canonic(pomdp)\n", "print(pomdp)" ] }, { "cell_type": "markdown", "id": "881c0f26", "metadata": {}, "source": [ "We see that the POMDP now contains information regarding the number of\n", "different observations ($546$ observations), as well as information regarding the action names for the\n", "choices in the model (as they matter semantically).\n", "The semantics of a POMDP can be given in terms of a **belief MDP**, which we can partially explore and verify." ] }, { "cell_type": "code", "execution_count": 13, "id": "fdaf0ac3", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:57.606827Z", "iopub.status.busy": "2026-03-26T10:40:57.606650Z", "iopub.status.idle": "2026-03-26T10:40:59.476529Z", "shell.execute_reply": "2026-03-26T10:40:59.475853Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Result in: [0.631357281978964, 0.6313566764945008]\n", "WARN (BeliefExplorationPomdpModelChecker.cpp:40): Upper bound '0.631357' is smaller than lower bound '0.631357': Difference is -6.05484e-07.\n" ] } ], "source": [ "belexpl_options = stormpy.pomdp.BeliefExplorationModelCheckerOptionsDouble(True, True)\n", "belexpl_options.use_clipping = False\n", "belexpl_options.refine = True\n", "\n", "belmc = stormpy.pomdp.BeliefExplorationModelCheckerDouble(pomdp, belexpl_options)\n", "result = belmc.check(properties[0].raw_formula, [])\n", "print(f\"Result in: [{result.lower_bound}, {result.upper_bound}]\")" ] }, { "cell_type": "markdown", "id": "4849f106", "metadata": {}, "source": [ "First, note that the result approximates the true optimum from below and above.\n", "The result here is tight, as the underlying belief MDP is sufficiently simple. The imprecision stems from the overall precision which is set to $10^{-6}$ by default.\n", "\n", "Second, note that the result coincides with the fully observable MDP case! That\n", "is correct as a policy can track how much fruit is available from every type." ] }, { "cell_type": "markdown", "id": "83df12f2", "metadata": {}, "source": [ "We can consider a modified Orchard game, where another player randomly steals fruit before the game starts.\n", "This is modeled in `orchard/orchard_pomdp_steal.pm`." ] }, { "cell_type": "code", "execution_count": 14, "id": "511e8ce4", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:59.478446Z", "iopub.status.busy": "2026-03-26T10:40:59.478242Z", "iopub.status.idle": "2026-03-26T10:40:59.591841Z", "shell.execute_reply": "2026-03-26T10:40:59.591240Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------------------------------------------------------- \n", "Model type: \tPOMDP (sparse)\n", "States: \t22434\n", "Transitions: \t44882\n", "Choices: \t29304\n", "Observations: \t547\n", "Reward Models: none\n", "State Labels: \t4 labels\n", " * deadlock -> 0 item(s)\n", " * init -> 1 item(s)\n", " * RavenWon -> 623 item(s)\n", " * PlayersWon -> 5 item(s)\n", "Choice Labels: \t12 labels\n", " * steal -> 1 item(s)\n", " * nextRound -> 3743 item(s)\n", " * pickAPPLE -> 3115 item(s)\n", " * pickPEAR -> 3115 item(s)\n", " * chooseAPPLE -> 2495 item(s)\n", " * moveRaven -> 3115 item(s)\n", " * pickCHERRY -> 3115 item(s)\n", " * pickPLUM -> 3115 item(s)\n", " * choosePEAR -> 2495 item(s)\n", " * chooseCHERRY -> 2495 item(s)\n", " * choosePLUM -> 2495 item(s)\n", " * noChoice -> 5 item(s)\n", "-------------------------------------------------------------- \n", "\n" ] } ], "source": [ "# Load model with stealing\n", "prism_program = stormpy.parse_prism_program(\"orchard/orchard_pomdp_steal.pm\")\n", "formula_str = 'Pmax=? [!\"RavenWon\" U \"PlayersWon\"]'\n", "properties = stormpy.parse_properties_for_prism_program(formula_str, prism_program)\n", "prism_program, properties = stormpy.preprocess_symbolic_input(\n", " prism_program, properties, \"\"\n", ")\n", "prism_program = prism_program.as_prism_program()\n", "options = stormpy.BuilderOptions([p.raw_formula for p in properties])\n", "options.set_build_state_valuations()\n", "options.set_build_choice_labels()\n", "pomdp_steal = stormpy.build_model(prism_program, properties)\n", "pomdp_steal = stormpy.pomdp.make_canonic(pomdp_steal)\n", "print(pomdp_steal)" ] }, { "cell_type": "markdown", "id": "f2b567ee", "metadata": {}, "source": [ "Afterwards, we can analyze the fully observable model and the partially observable model." ] }, { "cell_type": "code", "execution_count": 15, "id": "48927058", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:59.593807Z", "iopub.status.busy": "2026-03-26T10:40:59.593608Z", "iopub.status.idle": "2026-03-26T10:40:59.635911Z", "shell.execute_reply": "2026-03-26T10:40:59.635391Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.6657493559061428\n" ] } ], "source": [ "# Fully observable\n", "mdp_res = stormpy.model_checking(\n", " pomdp_steal, properties[0], force_fully_observable=True\n", ")\n", "print(mdp_res.at(pomdp_steal.initial_states[0]))" ] }, { "cell_type": "code", "execution_count": 16, "id": "836a5af3", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:59.637649Z", "iopub.status.busy": "2026-03-26T10:40:59.637476Z", "iopub.status.idle": "2026-03-26T10:41:03.335060Z", "shell.execute_reply": "2026-03-26T10:41:03.334572Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARN (BeliefExplorationPomdpModelChecker.cpp:40): Upper bound '0.659868' is smaller than lower bound '0.659869': Difference is -5.87945e-07.\n", "Result in: [0.6598687427896369, 0.6598681548441424]\n" ] } ], "source": [ "belexpl_options = stormpy.pomdp.BeliefExplorationModelCheckerOptionsDouble(True, True)\n", "belexpl_options.use_clipping = False\n", "belexpl_options.refine = True\n", "\n", "belmc = stormpy.pomdp.BeliefExplorationModelCheckerDouble(pomdp_steal, belexpl_options)\n", "result = belmc.check(properties[0].raw_formula, [])\n", "print(f\"Result in: [{result.lower_bound}, {result.upper_bound}]\")" ] }, { "cell_type": "markdown", "id": "d25c192b", "metadata": {}, "source": [ "We see that stealing fruits makes the game easier to win, but now an observation-based policy will perform worse than the state-based policies." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.13" } }, "nbformat": 4, "nbformat_minor": 5 }