{ "cells": [ { "cell_type": "markdown", "id": "c475ec70", "metadata": {}, "source": [ "# Building POMDPs\n", "In Stormvogel, a **Partially Observable Markov Decision Process (POMDP)** consists of\n", "* states $S$, actions $A$, an initial state $s_0$, a mapping of *enabled actions*, and a successor distribution $P(s,a)$, and a labelling function $L$ as for MDPs,\n", "* a set of observations $Z$,\n", "* and a deterministic state-observation function $O\\colon S \\rightarrow Z$.\n", "\n", "The key idea is that the observations encode what information an agent sees.\n", "An agent will have to make its decisions not based on the current state, but based on the history of observations it has seen.\n", "Note that usually when we refer to MDPs we actually mean *fully observable* MDPs, which are POMDPs with $Z = S$ and $O(s) = s$." ] }, { "cell_type": "markdown", "id": "8d17fe50", "metadata": {}, "source": [ "We introduce a simple example to illustrate the difference between MDPs and POMDPs.\n", "The idea is that a coin is flipped while the agent is not looking, and then the agent has to guess if it's heads or tails.\n", "We first construct an MDP." ] }, { "cell_type": "code", "execution_count": 1, "id": "89799fd0", "metadata": { "execution": { "iopub.execute_input": "2026-03-26T10:41:31.441429Z", "iopub.status.busy": "2026-03-26T10:41:31.441164Z", "iopub.status.idle": "2026-03-26T10:41:31.884640Z", "shell.execute_reply": "2026-03-26T10:41:31.884052Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "