{ "cells": [ { "cell_type": "markdown", "id": "4ea42f96", "metadata": {}, "source": [ "# Step Three: Working with Policies\n", "The use of modern probabilistic model checking increasingly asks for diagnostic information, for example the policies that optimize the objectives that are being checked.\n", "In the previous *Step Two*, we already saw that Stormvogel can visualize the model checking result and also the chosen actions in each state. However, this approach is only feasible for smaller state spaces.\n", "In this notebook, we consider a more general approach of extracting policies, the analysis of such policies with respect to alternative objectives, and the extraction of policies that can be concisely represented." ] }, { "cell_type": "markdown", "id": "660538c0", "metadata": {}, "source": [ "## Preparation\n", "As in the previous notebook, we start by loading the model.\n", "In this notebook, we will only use the Prism specification `orchard_prism` of the full Orchard game.\n", "While variable `orchard_prism` stores the MDP model, the variable `prism_program` stores the Prism specification of the Orchard game." ] }, { "cell_type": "code", "execution_count": 1, "id": "0b3f0710", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:52.586689Z", "iopub.status.busy": "2026-03-26T10:39:52.586523Z", "iopub.status.idle": "2026-03-26T10:39:52.869892Z", "shell.execute_reply": "2026-03-26T10:39:52.869282Z" } }, "outputs": [], "source": [ "# Import prism model from the previous step\n", "import stormvogel\n", "import stormpy\n", "\n", "from orchard.orchard_builder import build_prism" ] }, { "cell_type": "code", "execution_count": 2, "id": "5625ef66", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:52.872138Z", "iopub.status.busy": "2026-03-26T10:39:52.871842Z", "iopub.status.idle": "2026-03-26T10:39:52.989216Z", "shell.execute_reply": "2026-03-26T10:39:52.988587Z" }, "lines_to_next_cell": 2 }, "outputs": [], "source": [ "# We use the prism encoding in this part of the tutorial.\n", "orchard_prism, prism_program = build_prism()\n", "# We are primarily interested in the probability of winning.\n", "formula = stormpy.parse_properties('Pmax=? [F \"PlayersWon\"]')[0]" ] }, { "cell_type": "markdown", "id": "d3478cbf", "metadata": { "lines_to_next_cell": 2 }, "source": [ "Similarly to before, we use the helper function `model_check`." ] }, { "cell_type": "code", "execution_count": 3, "id": "97775e96", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:52.991249Z", "iopub.status.busy": "2026-03-26T10:39:52.991029Z", "iopub.status.idle": "2026-03-26T10:39:52.994500Z", "shell.execute_reply": "2026-03-26T10:39:52.993873Z" } }, "outputs": [], "source": [ "# Helper function\n", "def model_check(model, prop):\n", " formula = stormpy.parse_properties(prop)[0]\n", " result = stormpy.model_checking(model, formula, only_initial_states=True)\n", " return result.at(model.initial_states[0])" ] }, { "cell_type": "markdown", "id": "a0943a59", "metadata": {}, "source": [ "## Computing Policies\n", "We are interested in the optimal of winning the Orchard game.\n", "First, we inform Storm that we want to compute an optimal policy as a by-product of the model checking call, by adding the argument `extract_schedulers=True`.\n", "Computing policies can induce overhead due to the additional bookkeeping: we also note that this bookkeeping is not implemented for all types of properties." ] }, { "cell_type": "code", "execution_count": 4, "id": "344f6811", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:52.996346Z", "iopub.status.busy": "2026-03-26T10:39:52.996159Z", "iopub.status.idle": "2026-03-26T10:39:53.045852Z", "shell.execute_reply": "2026-03-26T10:39:53.045132Z" } }, "outputs": [], "source": [ "result = stormpy.model_checking(orchard_prism, formula, extract_scheduler=True)" ] }, { "cell_type": "markdown", "id": "5eb18a59", "metadata": {}, "source": [ "The output of the model checking call is a quantitative `result`, which now also holds a policy (also called scheduler)." ] }, { "cell_type": "code", "execution_count": 5, "id": "7c4e5bbd", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:53.047975Z", "iopub.status.busy": "2026-03-26T10:39:53.047747Z", "iopub.status.idle": "2026-03-26T10:39:53.053883Z", "shell.execute_reply": "2026-03-26T10:39:53.053213Z" } }, "outputs": [], "source": [ "string_scheduler = str(result.scheduler)" ] }, { "cell_type": "markdown", "id": "f2f109e0", "metadata": {}, "source": [ "Policies for maximal reachability probabilities as in the `formula` are memoryless and\n", "deterministic: a mapping from states to actions.\n", "The Storm representation mildly deviates: it represents for every state $s$ (index) the local index $x$ of the choice. These local indices encode the $x$’th choice for state $s$, using some internal ordering.\n", "Note that two choices in different states can yield the same action.\n", "\n", "Note that the resulting policy representation has more than $22k$ lines; one line for each of the over $22k$ states of the model.\n", "We therefore only print the first 20 lines as given in `max_lines`." ] }, { "cell_type": "code", "execution_count": 6, "id": "70dc0c51", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:53.055976Z", "iopub.status.busy": "2026-03-26T10:39:53.055756Z", "iopub.status.idle": "2026-03-26T10:39:53.061014Z", "shell.execute_reply": "2026-03-26T10:39:53.060425Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "___________________________________________________________________\n", "Fully defined memoryless deterministic scheduler:\n", "model state: choice(s)\n", " 0 0\n", " 1 0\n", " 2 0\n", " 3 0\n", " 4 0\n", " 5 0\n", " 6 0\n", " 7 0\n", " 8 0\n", " 9 0\n", " 10 0\n", " 11 0\n", " 12 0\n", " 13 0\n", " 14 0\n", " 15 0\n", " 16 2\n" ] } ], "source": [ "import os\n", "\n", "max_lines = 20\n", "print(os.linesep.join(string_scheduler.split(os.linesep)[:max_lines]))" ] }, { "cell_type": "markdown", "id": "a1366fce", "metadata": {}, "source": [ "The policy can also be queried to understand for every state which action is selected.\n", "We use the high-level state information present in `orchard_prism`, and obtain a more informative representation of the policy.\n", "\n", "Again, we stop the output after `max_lines`." ] }, { "cell_type": "code", "execution_count": 7, "id": "b54a7a0f", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:53.063197Z", "iopub.status.busy": "2026-03-26T10:39:53.062993Z", "iopub.status.idle": "2026-03-26T10:39:53.085771Z", "shell.execute_reply": "2026-03-26T10:39:53.085018Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "In state [s=0\t& d=1\t& apple=4\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'nextRound'}\n", "In state [s=1\t& d=1\t& apple=4\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'pickAPPLE'}\n", "In state [s=1\t& d=2\t& apple=4\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'pickPEAR'}\n", "In state [s=1\t& d=3\t& apple=4\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'pickCHERRY'}\n", "In state [s=1\t& d=4\t& apple=4\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'pickPLUM'}\n", "In state [s=1\t& d=5\t& apple=4\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'chooseAPPLE'}\n", "In state [s=1\t& d=6\t& apple=4\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'moveRaven'}\n", "In state [s=0\t& d=0\t& apple=3\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'nextRound'}\n", "In state [s=0\t& d=0\t& apple=4\t& pear=3\t& cherry=4\t& plum=4\t& raven=5] choose action {'nextRound'}\n", "In state [s=0\t& d=0\t& apple=4\t& pear=4\t& cherry=3\t& plum=4\t& raven=5] choose action {'nextRound'}\n", "In state [s=0\t& d=0\t& apple=4\t& pear=4\t& cherry=4\t& plum=3\t& raven=5] choose action {'nextRound'}\n", "In state [s=0\t& d=0\t& apple=4\t& pear=4\t& cherry=4\t& plum=4\t& raven=4] choose action {'nextRound'}\n", "In state [s=1\t& d=1\t& apple=3\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'pickAPPLE'}\n", "In state [s=1\t& d=2\t& apple=3\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'pickPEAR'}\n", "In state [s=1\t& d=3\t& apple=3\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'pickCHERRY'}\n", "In state [s=1\t& d=4\t& apple=3\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'pickPLUM'}\n", "In state [s=1\t& d=5\t& apple=3\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'chooseCHERRY'}\n", "In state [s=1\t& d=6\t& apple=3\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action {'moveRaven'}\n", "In state [s=1\t& d=1\t& apple=4\t& pear=3\t& cherry=4\t& plum=4\t& raven=5] choose action {'pickAPPLE'}\n", "In state [s=1\t& d=2\t& apple=4\t& pear=3\t& cherry=4\t& plum=4\t& raven=5] choose action {'pickPEAR'}\n" ] } ], "source": [ "i = 0\n", "for state in orchard_prism.states:\n", " choice = result.scheduler.get_choice(state)\n", " action_index = choice.get_deterministic_choice()\n", " action = state.actions[action_index]\n", " print(f\"In state {state.valuations} choose action {action.labels}\")\n", " i += 1\n", " if i >= max_lines:\n", " break" ] }, { "cell_type": "markdown", "id": "80b7ce1c", "metadata": {}, "source": [ "We are mostly interested in the choices `chooseF` as these are the player's choices. In all other states, only one action can be chosen anyway.\n", "We therefore only output these types of actions in the following." ] }, { "cell_type": "code", "execution_count": 8, "id": "36daa845", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:53.087887Z", "iopub.status.busy": "2026-03-26T10:39:53.087673Z", "iopub.status.idle": "2026-03-26T10:39:53.212473Z", "shell.execute_reply": "2026-03-26T10:39:53.211785Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "In state [s=1\t& d=5\t& apple=4\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action chooseAPPLE\n", "In state [s=1\t& d=5\t& apple=3\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action chooseCHERRY\n", "In state [s=1\t& d=5\t& apple=4\t& pear=3\t& cherry=4\t& plum=4\t& raven=5] choose action chooseCHERRY\n", "In state [s=1\t& d=5\t& apple=4\t& pear=4\t& cherry=3\t& plum=4\t& raven=5] choose action chooseAPPLE\n", "In state [s=1\t& d=5\t& apple=4\t& pear=4\t& cherry=4\t& plum=3\t& raven=5] choose action chooseAPPLE\n", "In state [s=1\t& d=5\t& apple=4\t& pear=4\t& cherry=4\t& plum=4\t& raven=4] choose action chooseAPPLE\n", "In state [s=1\t& d=5\t& apple=2\t& pear=4\t& cherry=4\t& plum=4\t& raven=5] choose action chooseCHERRY\n", "In state [s=1\t& d=5\t& apple=3\t& pear=3\t& cherry=4\t& plum=4\t& raven=5] choose action chooseCHERRY\n", "In state [s=1\t& d=5\t& apple=3\t& pear=4\t& cherry=3\t& plum=4\t& raven=5] choose action choosePEAR\n", "In state [s=1\t& d=5\t& apple=3\t& pear=4\t& cherry=4\t& plum=3\t& raven=5] choose action choosePEAR\n", "In state [s=1\t& d=5\t& apple=3\t& pear=4\t& cherry=4\t& plum=4\t& raven=4] choose action choosePEAR\n", "In state [s=1\t& d=5\t& apple=4\t& pear=2\t& cherry=4\t& plum=4\t& raven=5] choose action chooseCHERRY\n", "In state [s=1\t& d=5\t& apple=4\t& pear=3\t& cherry=3\t& plum=4\t& raven=5] choose action chooseAPPLE\n", "In state [s=1\t& d=5\t& apple=4\t& pear=3\t& cherry=4\t& plum=3\t& raven=5] choose action chooseAPPLE\n", "In state [s=1\t& d=5\t& apple=4\t& pear=3\t& cherry=4\t& plum=4\t& raven=4] choose action chooseAPPLE\n", "In state [s=1\t& d=5\t& apple=4\t& pear=4\t& cherry=2\t& plum=4\t& raven=5] choose action chooseAPPLE\n", "In state [s=1\t& d=5\t& apple=4\t& pear=4\t& cherry=3\t& plum=3\t& raven=5] choose action chooseAPPLE\n", "In state [s=1\t& d=5\t& apple=4\t& pear=4\t& cherry=3\t& plum=4\t& raven=4] choose action chooseAPPLE\n", "In state [s=1\t& d=5\t& apple=4\t& pear=4\t& cherry=4\t& plum=2\t& raven=5] choose action chooseAPPLE\n", "In state [s=1\t& d=5\t& apple=4\t& pear=4\t& cherry=4\t& plum=3\t& raven=4] choose action chooseAPPLE\n" ] } ], "source": [ "i = 0\n", "for state in orchard_prism.states:\n", " choice = result.scheduler.get_choice(state)\n", " action_index = choice.get_deterministic_choice()\n", " action = state.actions[action_index]\n", " action_name = next(iter(action.labels))\n", " if action_name.startswith(\"choose\"):\n", " print(f\"In state {state.valuations} choose action {action_name}\")\n", " i += 1\n", " if i >= max_lines:\n", " break" ] }, { "cell_type": "markdown", "id": "c7e1d5e5", "metadata": {}, "source": [ "We see that initially, when all trees are full (`apple=4`, `pear=4`, etc.), the player chooses 🍏 for the dice outcome 🧺.\n", "If for one apple was already picked (`apple=3`), the optimal strategy chooses 🍐 instead.\n", "\n", "Overall, we can see that the optimal policy chooses the type of fruit for which the most fruits are still available in the trees." ] }, { "cell_type": "markdown", "id": "68dfcb0b", "metadata": {}, "source": [ "## Analyzing the Induced Submodel\n", "After obtaining a policy, a standard question is to analyze the MDP and policy with respect to additional properties.\n", "\n", "For example, we may want to calculate the probability to collect all πŸ’ before the raven arrives using the current policy and contrast this probability with a policy that optimizes collecting πŸ’.\n", "To support analyzing policies, we can create the induced Markov chain $\\mathcal{M}^\\sigma$ by applying policy $\\sigma$ on MDP $\\mathcal{M}$.\n", "\n", "In our example, we apply the policy we previously obtained.\n", "Afterwards, we compute the maximal probability of picking all cherries: once on the induced model (which was optimized for winning) and once on the global model." ] }, { "cell_type": "code", "execution_count": 9, "id": "ebeb1f90", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:53.214601Z", "iopub.status.busy": "2026-03-26T10:39:53.214358Z", "iopub.status.idle": "2026-03-26T10:39:53.286143Z", "shell.execute_reply": "2026-03-26T10:39:53.285494Z" }, "lines_to_next_cell": 2 }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Prob for fixed: 0.7762457619444039\n", "Prob for optimal: 0.9120560407842779\n" ] } ], "source": [ "# Get induced model\n", "induced = orchard_prism.apply_scheduler(result.scheduler, True)\n", "# Analysis\n", "all_cherries = 'Pmax=? [F \"AllCherriesPicked\"]'\n", "print(f\"Prob for fixed: {model_check(induced, all_cherries)}\")\n", "print(f\"Prob for optimal: {model_check(orchard_prism, all_cherries)}\")" ] }, { "cell_type": "markdown", "id": "df34728b", "metadata": {}, "source": [ "We see that the policy optimized for collecting πŸ’ yields a higher probability of picking all πŸ’ than the submodel induced by the overall winning strategy." ] }, { "cell_type": "markdown", "id": "e1309a13", "metadata": {}, "source": [ "## Compact Policies\n", "The policy in the previous part was represented as an explicit list.\n", "There are various ways to compress a given policy: Via decision tree learning (available via the [dtControl tool](https://dtcontrol.model.in.tum.de/) one can create heuristically small trees for a policy, whereas\n", "dtMap (via the [PAYNT tool](https://github.com/randriu/synthesis)) allows to find a minimal decision tree for\n", "a given policy. Both tools allow interfacing with Storm.\n", "In the following, we showcase the use of dtMap." ] }, { "cell_type": "markdown", "id": "b1045918", "metadata": { "lines_to_next_cell": 2 }, "source": [ "Before we can create a compact policy, let us note that the policy above is not compactly representatable as a decision tree over linear predicates using the variables in the prism program (including the amount of fruit left). In this tutorial, we manually set the predicates we want to use.\n", "Here, we use `most_F` to indicate that fruit type `F` has the most fruits still left on the trees." ] }, { "cell_type": "code", "execution_count": 10, "id": "dfb0f667", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:53.288550Z", "iopub.status.busy": "2026-03-26T10:39:53.288324Z", "iopub.status.idle": "2026-03-26T10:39:53.295524Z", "shell.execute_reply": "2026-03-26T10:39:53.294901Z" } }, "outputs": [], "source": [ "def _declare_extra_orchard_variables(manager):\n", " apple_variable: stormpy.Variable = manager.get_variable(\"apple\")\n", " plum_variable: stormpy.Variable = manager.get_variable(\"plum\")\n", " pear_variable: stormpy.Variable = manager.get_variable(\"pear\")\n", " cherry_variable: stormpy.Variable = manager.get_variable(\"cherry\")\n", " # The else cases below are only relevant for rerunning this function multiple times.\n", " if not manager.has_variable(\"most_apples\"):\n", " most_apples_variable = manager.create_boolean_variable(\"most_apples\")\n", " else:\n", " most_apples_variable = manager.get_variable(\"most_apples\")\n", " if not manager.has_variable(\"most_pears\"):\n", " most_pears_variable = manager.create_boolean_variable(\"most_pears\")\n", " else:\n", " most_pears_variable = manager.get_variable(\"most_pears\")\n", " if not manager.has_variable(\"most_plums\"):\n", " most_plums_variable = manager.create_boolean_variable(\"most_plums\")\n", " else:\n", " most_plums_variable = manager.get_variable(\"most_plums\")\n", " if not manager.has_variable(\"most_cherries\"):\n", " most_cherries_variable = manager.create_boolean_variable(\"most_cherries\")\n", " else:\n", " most_cherries_variable = manager.get_variable(\"most_cherries\")\n", "\n", " apple_variable_expression = apple_variable.get_expression()\n", " plum_variable_expression = plum_variable.get_expression()\n", " pear_variable_expression = pear_variable.get_expression()\n", " cherry_variable_expression = cherry_variable.get_expression()\n", " # Add the extra definitions\n", " extra_definitions = []\n", " extra_definitions.append(\n", " (\n", " most_apples_variable,\n", " stormpy.Expression.Conjunction(\n", " [\n", " stormpy.Expression.Geq(\n", " apple_variable_expression, plum_variable_expression\n", " ),\n", " stormpy.Expression.Geq(\n", " apple_variable_expression, cherry_variable_expression\n", " ),\n", " stormpy.Expression.Geq(\n", " apple_variable_expression, pear_variable_expression\n", " ),\n", " ]\n", " ),\n", " )\n", " )\n", " extra_definitions.append(\n", " (\n", " most_pears_variable,\n", " stormpy.Expression.Conjunction(\n", " [\n", " stormpy.Expression.Geq(\n", " pear_variable_expression, plum_variable_expression\n", " ),\n", " stormpy.Expression.Geq(\n", " pear_variable_expression, cherry_variable_expression\n", " ),\n", " stormpy.Expression.Geq(\n", " pear_variable_expression, apple_variable_expression\n", " ),\n", " ]\n", " ),\n", " )\n", " )\n", " extra_definitions.append(\n", " (\n", " most_plums_variable,\n", " stormpy.Expression.Conjunction(\n", " [\n", " stormpy.Expression.Geq(\n", " plum_variable_expression, pear_variable_expression\n", " ),\n", " stormpy.Expression.Geq(\n", " plum_variable_expression, cherry_variable_expression\n", " ),\n", " stormpy.Expression.Geq(\n", " plum_variable_expression, apple_variable_expression\n", " ),\n", " ]\n", " ),\n", " )\n", " )\n", " extra_definitions.append(\n", " (\n", " most_cherries_variable,\n", " stormpy.Expression.Conjunction(\n", " [\n", " stormpy.Expression.Geq(\n", " cherry_variable_expression, pear_variable_expression\n", " ),\n", " stormpy.Expression.Geq(\n", " cherry_variable_expression, plum_variable_expression\n", " ),\n", " stormpy.Expression.Geq(\n", " cherry_variable_expression, apple_variable_expression\n", " ),\n", " ]\n", " ),\n", " )\n", " )\n", " return extra_definitions" ] }, { "cell_type": "markdown", "id": "e092a805", "metadata": {}, "source": [ "Specifically, we build new state valuations that assign variable assignments to every state. The following code achieves this. While this can be modified, we suggest to not change the settings here for the tutorial." ] }, { "cell_type": "code", "execution_count": 11, "id": "5c360c9a", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:53.297682Z", "iopub.status.busy": "2026-03-26T10:39:53.297497Z", "iopub.status.idle": "2026-03-26T10:39:53.335340Z", "shell.execute_reply": "2026-03-26T10:39:53.334659Z" } }, "outputs": [], "source": [ "# We allow a few options here.\n", "add_additional_definitions = True\n", "\"\"\"add_additional_definitions adds expressions saying that one fruit is among the types of fruit that we have most.\"\"\"\n", "maintain_old_valuations = False\n", "\"\"\"maintain_old_valuations preserves the old variables and their assignments\"\"\"\n", "copy_fruit_amounts = False\n", "\"\"\"copy_fruit_amounts preserves the amount of fruit of every type, useful if we are not maintaining the old valuations.\"\"\"\n", "assert not maintain_old_valuations or not copy_fruit_amounts\n", "\n", "# Create the transformer.\n", "svt = stormpy.StateValuationTransformer(orchard_prism.state_valuations)\n", "# Add extra Boolean definitions:\n", "if add_additional_definitions:\n", " extra_definitions = _declare_extra_orchard_variables(\n", " prism_program.expression_manager\n", " )\n", " for v, definition in extra_definitions:\n", " svt.add_boolean_expression(v, definition)\n", "# Add integer definitions. Here, we consider copies of the number of items, only useful we do not maintain the old variables.\n", "if copy_fruit_amounts:\n", " manager = prism_program.expression_manager\n", " svt.add_integer_expression(\n", " manager.create_integer_variable(\"nr_apples\"),\n", " manager.get_variable(\"apple\").get_expression(),\n", " )\n", " svt.add_integer_expression(\n", " manager.create_integer_variable(\"nr_plums\"),\n", " manager.get_variable(\"plum\").get_expression(),\n", " )\n", " svt.add_integer_expression(\n", " manager.create_integer_variable(\"nr_cherries\"),\n", " manager.get_variable(\"cherry\").get_expression(),\n", " )\n", " svt.add_integer_expression(\n", " manager.create_integer_variable(\"nr_pears\"),\n", " manager.get_variable(\"pear\").get_expression(),\n", " )\n", "# Create a new MDP.\n", "new_mdp = stormpy.set_state_valuations(\n", " orchard_prism, svt.build(maintain_old_valuations)\n", ")" ] }, { "cell_type": "markdown", "id": "69bf8105", "metadata": {}, "source": [ "We are now ready to run synthesis of decision tree policies. We use the tool PAYNT." ] }, { "cell_type": "code", "execution_count": 12, "id": "c3685397", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:53.337407Z", "iopub.status.busy": "2026-03-26T10:39:53.337161Z", "iopub.status.idle": "2026-03-26T10:39:53.478402Z", "shell.execute_reply": "2026-03-26T10:39:53.477800Z" } }, "outputs": [], "source": [ "import paynt" ] }, { "cell_type": "markdown", "id": "8d785ada", "metadata": {}, "source": [ "We distinguish two versions of decision tree synthesis:\n", " - matching/mapping a given policy into a concise format.\n", " - finding a decision tree policy that achieves the goal.\n", "\n", "We define the synthesis task with these two hyperparameters." ] }, { "cell_type": "code", "execution_count": 13, "id": "d3da06ae", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:53.480690Z", "iopub.status.busy": "2026-03-26T10:39:53.480492Z", "iopub.status.idle": "2026-03-26T10:39:53.483341Z", "shell.execute_reply": "2026-03-26T10:39:53.482861Z" } }, "outputs": [], "source": [ "tree_depth = 3\n", "\"\"\"The maximal depth of the policy\"\"\"\n", "match_fixed_policy = False\n", "\"\"\"Whether we should match a given policy (an optimal one found by storm)\"\"\"\n", "pass" ] }, { "cell_type": "markdown", "id": "d9c19820", "metadata": {}, "source": [ "We are now ready to run the code. Note that this may take a while. For `tree_depth` 3 and the `match_fixed_policy` set to `False`, the code should run in under two minutes, assuming that we have added additional definitions, that we did not maintain the old valuations, and that we did not copy the fruit amounts.\n", "If you want to play with settings, we suggest to use a smaller version of the orchard model by setting the amount of fruit and the distance of the raven to smaller constants." ] }, { "cell_type": "code", "execution_count": 14, "id": "80afa318", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:39:53.485115Z", "iopub.status.busy": "2026-03-26T10:39:53.484944Z", "iopub.status.idle": "2026-03-26T10:40:53.220174Z", "shell.execute_reply": "2026-03-26T10:40:53.219577Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overall optimal result: 0.6313566764945008\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.0%, elapsed 3 s, estimated 88122259 s (2 years), iters = {MDP: 16}, opt = 0.6039\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.0%, elapsed 6 s, estimated 6610716 s (76 days), iters = {MDP: 55}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.0%, elapsed 9 s, estimated 5234702 s (60 days), iters = {MDP: 89}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.0%, elapsed 12 s, estimated 3154236 s (36 days), iters = {MDP: 128}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.0%, elapsed 15 s, estimated 3009790 s (34 days), iters = {MDP: 164}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.0%, elapsed 18 s, estimated 3034164 s (35 days), iters = {MDP: 205}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.001%, elapsed 21 s, estimated 1573456 s (18 days), iters = {MDP: 240}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.002%, elapsed 24 s, estimated 1202768 s (13 days), iters = {MDP: 272}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.002%, elapsed 27 s, estimated 1324602 s (15 days), iters = {MDP: 315}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.004%, elapsed 30 s, estimated 720660 s (8 days), iters = {MDP: 346}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.006%, elapsed 33 s, estimated 552593 s (6 days), iters = {MDP: 385}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.006%, elapsed 36 s, estimated 573693 s (6 days), iters = {MDP: 424}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.006%, elapsed 40 s, estimated 616898 s (7 days), iters = {MDP: 463}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.006%, elapsed 43 s, estimated 643782 s (7 days), iters = {MDP: 499}, opt = 0.62\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.008%, elapsed 46 s, estimated 567775 s (6 days), iters = {MDP: 534}, opt = 0.6314\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 0.138%, elapsed 49 s, estimated 35618 s (9 hours), iters = {MDP: 573}, opt = 0.6314\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 1.041%, elapsed 52 s, estimated 5030 s, iters = {MDP: 606}, opt = 0.6314\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 62.5%, elapsed 55 s, estimated 89 s, iters = {MDP: 635}, opt = 0.6314\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "> progress 91.145%, elapsed 58 s, estimated 64 s, iters = {MDP: 664}, opt = 0.6314\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "--------------------\n", "Synthesis summary:\n", "optimality objective: Pmax=? [F \"PlayersWon\"] \n", "\n", "method: AR (decision tree), synthesis time: 59.14 s\n", "number of holes: 43, family size: 1e12, quotient: 22469 states / 32394 actions\n", "explored: 100 %\n", "MDP stats: avg MDP size: 22469, iterations: 668\n", "\n", "optimum: 0.631357\n", "--------------------\n", "\n", "success: True\n" ] } ], "source": [ "# We run model checking once more, and optionally extract a scheduler.\n", "mc_result = stormpy.model_checking(\n", " new_mdp, formula, extract_scheduler=match_fixed_policy\n", ")\n", "print(f\"Overall optimal result: {mc_result.at(new_mdp.initial_states[0])}\")\n", "# Paynt considers synthesis via annotated MDPs called coloured MDPs.\n", "# For synthesis of decision trees, we use paynt.dt.\n", "colored_mdp_factory = paynt.dt.DtColoredMdpFactory(new_mdp)\n", "# We now set the task as specified in the hyperparameters above.\n", "task = paynt.dt.DtTask([formula], tree_depth)\n", "if match_fixed_policy:\n", " task.set_scheduler_to_map(mc_result.scheduler)\n", "# With the synthesis task specified, we run the synthesis routine.\n", "result = paynt.dt.synthesize(colored_mdp_factory, task)\n", "print(\"success:\", result.success)" ] }, { "cell_type": "code", "execution_count": 15, "id": "0d05abc8", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:40:53.221930Z", "iopub.status.busy": "2026-03-26T10:40:53.221738Z", "iopub.status.idle": "2026-03-26T10:40:53.225268Z", "shell.execute_reply": "2026-03-26T10:40:53.224657Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "value: 0.6313573066006369\n", "decision tree:\n", " if most_pears<=0:\n", " if most_apples<=0:\n", " if most_plums<=0:\n", " chooseCHERRY\n", " else:\n", " choosePLUM\n", " else:\n", " if most_plums<=0:\n", " chooseAPPLE\n", " else:\n", " choosePLUM\n", "else:\n", " if most_plums<=0:\n", " if most_plums<=0:\n", " choosePEAR\n", " else:\n", " __random__\n", " else:\n", " if most_cherries<=0:\n", " choosePLUM\n", " else:\n", " choosePLUM\n", "\n" ] } ], "source": [ "# If successful, we display the obtained result.\n", "if result.success:\n", " tree = result.tree\n", " print(\"value:\", result.value)\n", " print(\"decision tree:\\n\", result.tree.to_string())" ] }, { "cell_type": "markdown", "id": "b9892df0", "metadata": {}, "source": [ "The decision tree represents the optimal policy (which achieves the winning probability of $0.6313$.\n", "The decision tree is presented as simple Python code.\n", "Condition `most_plums<=0` represents that `most_plums` is `False` whereas the else case represents that `most_plums` is `True`.\n", "Following the first condition, we see that if neither `most_plums`, `most_pears` nor `most_apples` then we should choose πŸ’. If however, neither `most_plums` nor `most_pears` but `most_apples`, then we should choose 🍏." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.13" } }, "nbformat": 4, "nbformat_minor": 5 }