Simulator¶

If we have a transition system, it might be nice to run a simulation. In this case, we have an MDP that models a hungry lion. Depending on the state it is in, it needs to decide whether it wants to ‘rawr’ or ‘hunt’ in order to prevent reaching the state ‘dead’.

[1]:

from stormvogel import *
import stormvogel

lion = examples.create_lion_mdp()
show(lion)

Network

[1]:

<stormvogel.visualization.JSVisualization at 0x7f8b5736c1a0>

Now, let’s run a simulation of the lion! If we do not provide a scheduling function, then the simulator just does a random walk, taking a random choice each time.

[2]:

path = simulate_path(lion, steps=5, seed=1234)

We could also provide a scheduling function to choose the actions ourselves. This is somewhat similar to the bird API.

[3]:

def scheduler(s: State) -> Action:
    return Action("rawr")


path2 = stormvogel.simulator.simulate_path(
    lion, steps=5, seed=1234, scheduler=scheduler
)

We can also use the scheduler to create a partial model. This model contains all the states that have been discovered by the the simulation.

[4]:

partial_model = stormvogel.simulator.simulate(
    lion, steps=5, scheduler=scheduler, seed=1234
)
show(partial_model)

Network

[4]:

<stormvogel.visualization.JSVisualization at 0x7f8b56ab3c50>

Gymnasium-Compliant Environment¶

Stormvogel models can be wrapped as a Gymnasium environment via ModelEnv. This lets you use standard reinforcement-learning libraries directly on a stormvogel MDP or DTMC without any manual glue code.

ModelEnv supports both MDP and DTMC models:

For an MDP the action space is Discrete(n_actions), one index per named action.
For a DTMC there is no choice, so the action space is Discrete(1) — always pass 0.

The observation space has two modes, selected by obs_type:

"index" (default): a plain integer — the index of the current state.
"valuations": a Dict space built from variables that have a declared domain (IntDomain, BoolDomain, or CategoricalDomain), one Discrete component per variable.

[5]:

from stormvogel.gym_env import ModelEnv, ActionUnavailableError

env = ModelEnv(lion)
print("Observation space:", env.observation_space)
print("Action space:     ", env.action_space)
print("Actions:          ", env._index_to_action)

Observation space: Discrete(5)
Action space:      Discrete(3)
Actions:           [Action('hunt >:D'), Action('rawr'), Action(None)]

The standard Gymnasium loop works as-is. reset() returns the initial observation and an info dict; step(action) returns the next observation, reward, terminated flag, truncated flag, and an info dict. The info dict always contains the raw stormvogel.model.State under the key "state".

[6]:

obs, info = env.reset(seed=42)
print("Initial state index:", obs, "— state:", info["state"])

hunt_idx = next(i for i, a in enumerate(env._index_to_action) if "hunt" in str(a))
obs, reward, terminated, truncated, info = env.step(hunt_idx)
print("After 'hunt': obs =", obs, "| reward =", reward, "| terminated =", terminated)

Initial state index: 0 — state: State(id=be2539c1-fb7a-4e3f-89e1-71a1dcfdaac6, labels=['init'])
After 'hunt': obs = 0 | reward = 0.0 | terminated = False

Passing an action that is not available in the current state raises ActionUnavailableError rather than silently producing incorrect behaviour.

Variable-domain observations¶

When all variables in the model carry a declared domain, obs_type="valuations" gives a Dict observation whose keys are variable names and whose values are non-negative integers (domain encoding). This is more informative than a raw state index and compatible with structured RL policies.

[7]:

from stormvogel.examples.monty_hall import create_monty_hall_mdp

mh = create_monty_hall_mdp()
mh_env = ModelEnv(mh, obs_type="valuations")
print("Observation space:", mh_env.observation_space)
obs, info = mh_env.reset()
print("Initial obs (all variables at sentinel -1):", obs)

Observation space: Dict('car_pos': Discrete(4), 'chosen_pos': Discrete(4), 'reveal_pos': Discrete(4))
Initial obs (all variables at sentinel -1): {'car_pos': 0, 'chosen_pos': 0, 'reveal_pos': 0}