stormvogel.teaching.belief_mdp¶
Bounded belief-MDP exploration for POMDPs.
Attributes¶
Classes¶
A belief that was cut off at the exploration boundary. |
|
Shared absorbing terminal state ( |
Functions¶
|
Explore the belief MDP of a POMDP up to max_states distinct beliefs. |
Module Contents¶
- class stormvogel.teaching.belief_mdp.FrontierBelief(dist: dict[State, Fraction])¶
Bases:
stormvogel.teaching.belief.BeliefA belief that was cut off at the exploration boundary.
In the belief MDP a frontier belief has a single
"cut"action. The probability of reaching the target is the dot productΣ_s c(s) · b(s)of the per-state cutoff function c with the frontier belief; the remainder goes to the fresh absorbing sink. Frontier beliefs receive the label"frontier".- __hash__() int¶
- __eq__(other: object) bool¶
- __repr__() str¶
- class stormvogel.teaching.belief_mdp._Terminal(label: str)¶
Shared absorbing terminal state (
"target"or"sink").- _label¶
- __repr__() str¶
- stormvogel.teaching.belief_mdp._TARGET¶
- stormvogel.teaching.belief_mdp._SINK¶
- stormvogel.teaching.belief_mdp.belief_mdp(pomdp: stormvogel.model.model.Model, initial_belief: Mapping[State, Fraction | int], cutoff: Mapping[State, Fraction | float] | Fraction | int | float = Fraction(0), max_states: int = 1000) stormvogel.model.model.Model¶
Explore the belief MDP of a POMDP up to max_states distinct beliefs.
Starting from initial_belief, successor beliefs are computed by the standard Bayesian belief update and explored BFS-style via the bird API. Once max_states distinct
Beliefnodes have been committed, any new successor belief becomes aFrontierBeliefinstead.Frontier transitions: each
FrontierBeliefhas a single"cut"action. The probability of reaching the shared target state is the dot product \(c \cdot b_f = \sum_s c(s)\, b_f(s)\) of the cutoff function with the frontier belief; the remainder goes to the shared sink. Both terminal states are absorbing.cutoff may be:
A scalar in
[0, 1]: uniform per-state value.0(default) gives a pessimistic lower bound;1gives an optimistic upper bound.A
Mapping[State, Fraction | float]: per-state cutoff function \(c \colon S \to [0,1]\). Absent states default to 0. Passing the MDP value function (e.g. frommdp_bound_alpha()) yields the MDP-value upper bound.
Rewards: propagated as expected POMDP reward
Σ_s b[s] · r(s). Frontier, target, and sink states carry reward 0 in every reward model.Warnings: emitted when belief-support states disagree on their available actions or on the presence of a label.
- Parameters:
pomdp – A POMDP with deterministic (non-stochastic) state observations.
initial_belief – Mapping from POMDP states to non-negative probabilities summing to 1.
cutoff – Per-state cutoff function \(c\colon S\to[0,1]\), or a scalar applied uniformly. Defaults to
0(pessimistic).max_states – Maximum number of distinct
Beliefnodes to expand fully (including the initial belief).
- Returns:
The explored belief MDP as a stormvogel MDP model.
- Raises:
ValueError – If pomdp is not a POMDP, any state has a stochastic observation, initial_belief does not sum to 1, or a scalar cutoff is outside [0, 1].