stormvogel.examples.two_state_commitment_pomdp¶
Two-state commitment POMDP.
This POMDP illustrates the strict gap V_POMDP < V_QMDP < V_MDP:
V_POMDP(b_0) = 1/2, V_QMDP(b_0) = 4/5, V_MDP(b_0) = 1.
States s1 and s2 share observation z; the agent can never
distinguish them. Action a1 succeeds (reaches goal g) iff the
hidden state is s1; action a2 succeeds iff it is s2. Action
w keeps the current state with probability 4/5 but risks failure with
1/5, and is never informative.
Because w is strictly risky and reveals no information, the optimal
POMDP policy commits immediately with a1 (or a2) at the 50/50
initial belief, giving value 1/2. The QMDP heuristic over-estimates at
4/5 by assuming the state becomes known after one step. The MDP oracle
always picks the right action and achieves value 1.
Functions¶
Return the two-state commitment POMDP. |
Module Contents¶
- stormvogel.examples.two_state_commitment_pomdp.create_two_state_commitment_pomdp(p: Fraction | float = Fraction(1, 2)) stormvogel.model.Model¶
Return the two-state commitment POMDP.
The initial distribution puts weight p on
s1and1 − pons2.Transitions:
s1 --a1--> g (1) s1 --a2--> f (1) s1 --w --> s1 (4/5), f (1/5) s2 --a1--> f (1) s2 --a2--> g (1) s2 --w --> s2 (4/5), f (1/5)
Analytical values at the uniform initial belief
b_0 = {s1:1/2, s2:1/2}:V_MDP(b_0) = 1 (oracle always picks the correct action) V_QMDP(b_0) = 4/5 (w-alpha dominates; assumes revelation after one step) V_POMDP(b_0) = 1/2 (commit immediately; waiting is never informative)
- Parameters:
p – Initial probability of being in
s1(must be in (0, 1)).- Returns:
A stormvogel POMDP model.
- Raises:
ValueError – If p is not in (0, 1).