stormvogel.teaching.lovejoy

Lovejoy grid MDP: finite upper-bounding MDP for POMDPs.

Constructs the Lovejoy grid MDP for POMDPs where every observation class has at most three states.

  • Two-state class — the belief simplex is a line segment [0, 1]; a uniform grid of step 1/k covers it, and off-grid successors are replaced by linear interpolation over the two adjacent grid points.

  • Three-state class — the belief simplex is a triangle (2-simplex); a uniform triangulation at resolution k covers it with (k+1)(k+2)//2 grid points, and off-grid successors are replaced by barycentric interpolation over the three vertices of the enclosing sub-triangle.

In both cases the interpolated value is an over-approximation because V* is convex, so the result is a stormvogel MDP whose maximal reachability probability is an upper bound on the true POMDP value at the initial belief.

Functions

lovejoy_grid_mdp(→ stormvogel.model.model.Model)

Build the Lovejoy grid MDP for a POMDP with at most 2 states per observation.

Module Contents

stormvogel.teaching.lovejoy.lovejoy_grid_mdp(pomdp: stormvogel.model.model.Model, initial_belief: Mapping[State, Fraction | int], k: int) stormvogel.model.model.Model

Build the Lovejoy grid MDP for a POMDP with at most 2 states per observation.

Every observation class must contain at most two POMDP states. For each such class the belief simplex is 1-dimensional; a uniform grid of k + 1 points {0, 1/k, 2/k, …, 1} covers it. Successor beliefs that fall between two grid points are replaced by their linear convex interpolation, which gives an over-approximation because V* is convex.

The returned MDP has exactly the reachable grid beliefs as states. Its maximal reachability probability (under label target) is an upper bound on V*(initial_belief).

Parameters:
  • pomdp – A POMDP where every observation class has at most 3 states.

  • initial_belief – Distribution over POMDP states summing to 1. Its support must lie within a single observation class and every probability must be an exact multiple of 1/k.

  • k – Grid resolution. Grid points for each 2-state observation class are at {0, 1/k, 2/k, …, 1}; for each 3-state class they form a uniform triangulation of the 2-simplex at step 1/k.

Returns:

A stormvogel MDP whose optimal reachability value upper-bounds V*(initial_belief).

Raises:

ValueError – If pomdp is not a POMDP, any observation class has more than 3 states, initial_belief does not sum to 1, its support spans more than one observation class, or any probability is not an exact multiple of 1/k.