stormvogel.teaching.belief_mdp

Bounded belief-MDP exploration for POMDPs.

Attributes

Classes

FrontierBelief

A belief that was cut off at the exploration boundary.

_Terminal

Shared absorbing terminal state ("target" or "sink").

Functions

belief_mdp(, max_states)

Explore the belief MDP of a POMDP up to max_states distinct beliefs.

Module Contents

class stormvogel.teaching.belief_mdp.FrontierBelief(dist: dict[State, Fraction])

Bases: stormvogel.teaching.belief.Belief

A belief that was cut off at the exploration boundary.

In the belief MDP a frontier belief has a single "cut" action. The probability of reaching the target is the dot product Σ_s c(s) · b(s) of the per-state cutoff function c with the frontier belief; the remainder goes to the fresh absorbing sink. Frontier beliefs receive the label "frontier".

__hash__() int
__eq__(other: object) bool
__repr__() str
class stormvogel.teaching.belief_mdp._Terminal(label: str)

Shared absorbing terminal state ("target" or "sink").

_label
__repr__() str
stormvogel.teaching.belief_mdp._TARGET
stormvogel.teaching.belief_mdp._SINK
stormvogel.teaching.belief_mdp.belief_mdp(pomdp: stormvogel.model.model.Model, initial_belief: Mapping[State, Fraction | int], cutoff: Mapping[State, Fraction | float] | Fraction | int | float = Fraction(0), max_states: int = 1000) stormvogel.model.model.Model

Explore the belief MDP of a POMDP up to max_states distinct beliefs.

Starting from initial_belief, successor beliefs are computed by the standard Bayesian belief update and explored BFS-style via the bird API. Once max_states distinct Belief nodes have been committed, any new successor belief becomes a FrontierBelief instead.

Frontier transitions: each FrontierBelief has a single "cut" action. The probability of reaching the shared target state is the dot product \(c \cdot b_f = \sum_s c(s)\, b_f(s)\) of the cutoff function with the frontier belief; the remainder goes to the shared sink. Both terminal states are absorbing.

cutoff may be:

  • A scalar in [0, 1]: uniform per-state value. 0 (default) gives a pessimistic lower bound; 1 gives an optimistic upper bound.

  • A Mapping[State, Fraction | float]: per-state cutoff function \(c \colon S \to [0,1]\). Absent states default to 0. Passing the MDP value function (e.g. from mdp_bound_alpha()) yields the MDP-value upper bound.

Rewards: propagated as expected POMDP reward Σ_s b[s] · r(s). Frontier, target, and sink states carry reward 0 in every reward model.

Warnings: emitted when belief-support states disagree on their available actions or on the presence of a label.

Parameters:
  • pomdp – A POMDP with deterministic (non-stochastic) state observations.

  • initial_belief – Mapping from POMDP states to non-negative probabilities summing to 1.

  • cutoff – Per-state cutoff function \(c\colon S\to[0,1]\), or a scalar applied uniformly. Defaults to 0 (pessimistic).

  • max_states – Maximum number of distinct Belief nodes to expand fully (including the initial belief).

Returns:

The explored belief MDP as a stormvogel MDP model.

Raises:

ValueError – If pomdp is not a POMDP, any state has a stochastic observation, initial_belief does not sum to 1, or a scalar cutoff is outside [0, 1].