stormvogel.teaching.lovejoy¶
Lovejoy grid MDP: finite upper-bounding MDP for POMDPs.
Constructs the Lovejoy grid MDP for POMDPs where every observation class has at most three states.
Two-state class — the belief simplex is a line segment
[0, 1]; a uniform grid of step1/kcovers it, and off-grid successors are replaced by linear interpolation over the two adjacent grid points.Three-state class — the belief simplex is a triangle (2-simplex); a uniform triangulation at resolution
kcovers it with(k+1)(k+2)//2grid points, and off-grid successors are replaced by barycentric interpolation over the three vertices of the enclosing sub-triangle.
In both cases the interpolated value is an over-approximation because V*
is convex, so the result is a stormvogel MDP whose maximal reachability
probability is an upper bound on the true POMDP value at the initial belief.
Functions¶
|
Build the Lovejoy grid MDP for a POMDP with at most 2 states per observation. |
Module Contents¶
- stormvogel.teaching.lovejoy.lovejoy_grid_mdp(pomdp: stormvogel.model.model.Model, initial_belief: Mapping[State, Fraction | int], k: int) stormvogel.model.model.Model¶
Build the Lovejoy grid MDP for a POMDP with at most 2 states per observation.
Every observation class must contain at most two POMDP states. For each such class the belief simplex is 1-dimensional; a uniform grid of
k + 1points{0, 1/k, 2/k, …, 1}covers it. Successor beliefs that fall between two grid points are replaced by their linear convex interpolation, which gives an over-approximation becauseV*is convex.The returned MDP has exactly the reachable grid beliefs as states. Its maximal reachability probability (under label target) is an upper bound on
V*(initial_belief).- Parameters:
pomdp – A POMDP where every observation class has at most 3 states.
initial_belief – Distribution over POMDP states summing to 1. Its support must lie within a single observation class and every probability must be an exact multiple of
1/k.k – Grid resolution. Grid points for each 2-state observation class are at
{0, 1/k, 2/k, …, 1}; for each 3-state class they form a uniform triangulation of the 2-simplex at step1/k.
- Returns:
A stormvogel MDP whose optimal reachability value upper-bounds
V*(initial_belief).- Raises:
ValueError – If pomdp is not a POMDP, any observation class has more than 3 states, initial_belief does not sum to 1, its support spans more than one observation class, or any probability is not an exact multiple of
1/k.