stormvogel.teaching.multiobjective

Goal unfolding and weighted multi-target reachability.

Functions

goal_unfolding(…)

Build the goal-unfolding product of an MDP.

weighted_multi_target_reachability(...)

Build a model for weighted multi-target reachability via goal unfolding.

compute_weighted_reachability_policy(...)

Compute an optimal policy for weighted multi-target reachability.

evaluate_policy_reachability(→ list[float])

Evaluate a policy's reachability probability for each individual target.

_compute_stuck_states(→ frozenset)

Return states from which no state with strictly more bits set is reachable.

lp_dual_multireachprob(→ stormvogel.teaching.lp.LP)

Occupancy-measure LP for multiobjective reachability on a goal unfolding.

Module Contents

stormvogel.teaching.multiobjective.goal_unfolding(mdp: stormvogel.model.Model, goal_labels: list[str], return_state_bits: Literal[True]) tuple[stormvogel.model.Model, dict]
stormvogel.teaching.multiobjective.goal_unfolding(mdp: stormvogel.model.Model, goal_labels: list[str], return_state_bits: Literal[False] = ...) stormvogel.model.Model

Build the goal-unfolding product of an MDP.

The product state (s, b) tracks which goals have been visited on the path to reach it. b[i] = True iff a state labelled goal_labels[i] has been visited (including s itself). Bits never flip back to False.

Transition semantics: (s, b) –α–> (t, delta(b, L(t))) where delta(b, L(t))[i] = b[i] OR (goal_labels[i] in L(t)).

Product state labels: L(s).

Parameters:

return_state_bits – When True, return a pair (model, bits_map) where bits_map maps each unfolded State to its bit tuple.

stormvogel.teaching.multiobjective.weighted_multi_target_reachability(mdp: stormvogel.model.Model, target_labels: list[str], weights: list[float]) stormvogel.model.Model

Build a model for weighted multi-target reachability via goal unfolding.

Each target T_i is identified by label target_labels[i] with weight weights[i]. The returned model has a single state reward model “weighted_reach” whose expected total reward equals sum_i weights[i] * P(reach T_i).

Uses goal_unfolding to track first visits and transition rewards (eliminated to state rewards via auxiliary entry states) to assign weights.

Parameters:
  • mdp – The MDP.

  • target_labels – Labels identifying each target set T_i.

  • weights – Non-negative weight w_i for each T_i.

Returns:

Transformed MDP with state reward model “weighted_reach”.

stormvogel.teaching.multiobjective.compute_weighted_reachability_policy(mdp: stormvogel.model.Model, target_labels: list[str], weights: list[float]) stormvogel.result.Result

Compute an optimal policy for weighted multi-target reachability.

Reduces the problem to single-objective total reward maximization via weighted_multi_target_reachability(), then calls stormpy to solve R{"weighted_reach"}max=? [C] and extract the scheduler.

Parameters:
  • mdp – The input MDP.

  • target_labels – Labels identifying each target set T_i.

  • weights – Non-negative weight w_i for each T_i.

Returns:

Model-checking result on the transformed MDP, including the optimal scheduler in result.scheduler.

Raises:
  • ImportError – If stormpy is not installed.

  • RuntimeError – If model checking returns no result.

stormvogel.teaching.multiobjective.evaluate_policy_reachability(result: stormvogel.result.Result, target_labels: list[str]) list[float]

Evaluate a policy’s reachability probability for each individual target.

Induces a DTMC from the scheduler in result and runs one reachability query P=? [F "label"] per entry in target_labels.

Parameters:
Returns:

List of length len(target_labels) where entry i is P_policy(reach T_i) at the initial state.

Raises:
  • ImportError – If stormpy is not installed.

  • RuntimeError – If the induced DTMC cannot be constructed or model checking fails for any label.

stormvogel.teaching.multiobjective._compute_stuck_states(unfolded: stormvogel.model.Model, bits_map: dict, m: int) frozenset

Return states from which no state with strictly more bits set is reachable.

A state is stuck if the bit vector can never improve from it — either because all bits are already True, or because the state lies in a part of the model from which no new goal label can ever be reached.

stormvogel.teaching.multiobjective.lp_dual_multireachprob(unfolded: stormvogel.model.Model, bits_map: dict, goal_labels: list[str], weights: list[float] | None = None, threshold: list[float] | None = None) stormvogel.teaching.lp.LP

Occupancy-measure LP for multiobjective reachability on a goal unfolding.

unfolded and bits_map must be the pair returned by goal_unfolding(mdp, goal_labels, return_state_bits=True). Passing any other model raises ValueError.

Variables y_{s,a} represent expected visit counts for each free state-action pair. A state is free if progress (a new goal bit flipping) is still reachable from it; all other states are treated as absorbing and receive no flow variable.

The reach expression for goal i counts only transitions (s, a, t) where bits_map[s][i] is False and goal_labels[i] in t.labels, ensuring each first-visit is counted exactly once.

Parameters:
  • unfolded – Goal-unfolded MDP (output of goal_unfolding()).

  • bits_map – Mapping from each state in unfolded to its bit tuple (output of goal_unfolding() with return_state_bits=True).

  • goal_labels – Labels identifying each goal, in the same order used when calling goal_unfolding().

  • weights – Objective weights λ_k; objective is zero if None.

  • threshold – Optional lower-bound vector of length K. Adds P(reach goal_i) threshold[i] constraints.

Raises:

ValueError – If bits_map is empty or its tuple length does not match goal_labels.