stormvogel.teaching.belief_mdp¶

Bounded belief-MDP exploration for POMDPs.

Attributes¶

`_TARGET`
`_SINK`

Classes¶

`FrontierBelief`	A belief that was cut off at the exploration boundary.
`_Terminal`	Shared absorbing terminal state (`"target"` or `"sink"`).

Functions¶

belief_mdp(, max_states)

Explore the belief MDP of a POMDP up to max_states distinct beliefs.

Module Contents¶

class stormvogel.teaching.belief_mdp.FrontierBelief(dist: dict[State, Fraction])¶

Bases: stormvogel.teaching.belief.Belief

A belief that was cut off at the exploration boundary.

In the belief MDP a frontier belief has a single "cut" action. The probability of reaching the target is the dot product Σ_s c(s) · b(s) of the per-state cutoff function c with the frontier belief; the remainder goes to the fresh absorbing sink. Frontier beliefs receive the label "frontier".

__hash__() → int¶

__eq__(other: object) → bool¶

__repr__() → str¶

class stormvogel.teaching.belief_mdp._Terminal(label: str)¶

Shared absorbing terminal state ("target" or "sink").

_label¶

__repr__() → str¶

stormvogel.teaching.belief_mdp._TARGET¶

stormvogel.teaching.belief_mdp._SINK¶

stormvogel.teaching.belief_mdp.belief_mdp(pomdp: stormvogel.model.model.Model, initial_belief: Mapping[State, Fraction | int], cutoff: Mapping[State, Fraction | float] | Fraction | int | float = Fraction(0), max_states: int = 1000) → stormvogel.model.model.Model¶

Explore the belief MDP of a POMDP up to max_states distinct beliefs.

Starting from initial_belief, successor beliefs are computed by the standard Bayesian belief update and explored BFS-style via the bird API. Once max_states distinct Belief nodes have been committed, any new successor belief becomes a FrontierBelief instead.

Frontier transitions: each FrontierBelief has a single "cut" action. The probability of reaching the shared target state is the dot product \(c \cdot b_f = \sum_s c(s)\, b_f(s)\) of the cutoff function with the frontier belief; the remainder goes to the shared sink. Both terminal states are absorbing.

cutoff may be:

A scalar in [0, 1]: uniform per-state value. 0 (default) gives a pessimistic lower bound; 1 gives an optimistic upper bound.
A Mapping[State, Fraction | float]: per-state cutoff function \(c \colon S \to [0,1]\). Absent states default to 0. Passing the MDP value function (e.g. from mdp_bound_alpha()) yields the MDP-value upper bound.

Rewards: propagated as expected POMDP reward Σ_s b[s] · r(s). Frontier, target, and sink states carry reward 0 in every reward model.

Warnings: emitted when belief-support states disagree on their available actions or on the presence of a label.

Parameters:

pomdp – A POMDP with deterministic (non-stochastic) state observations.
initial_belief – Mapping from POMDP states to non-negative probabilities summing to 1.
cutoff – Per-state cutoff function \(c\colon S\to[0,1]\), or a scalar applied uniformly. Defaults to 0 (pessimistic).
max_states – Maximum number of distinct Belief nodes to expand fully (including the initial belief).

Returns:

The explored belief MDP as a stormvogel MDP model.

Raises:

ValueError – If pomdp is not a POMDP, any state has a stochastic observation, initial_belief does not sum to 1, or a scalar cutoff is outside [0, 1].