Constraint Monitor

Usage Guide

ToDo

Constraint Monitor API Documentation

class gym_electric_motor.core.ConstraintMonitor(limit_constraints=(), additional_constraints=(), merge_violations='max')[source]

The ConstraintMonitor is used within the ElectricMotorEnvironment to monitor the states for illegal / undesired values (e.g. overcurrents).

It consists of a list of multiple independent constraints. Each constraint gets the current observation of the environment as input and returns a violation degree within \([0.0, 1.0]\). All these are merged together and the ConstraintMonitor returns a total violation degree.

Soft Constraints:

To enable a higher flexibility, the constraints return a violation degree (float) instead of a simple violation flag (bool). So, even before the limits are violated, the reward function can take the limit violation degree into account. If the violation degree is at 0.0, no states are in a dangerous region. For values between 0.0 and 1.0 the reward will be decreased gradually so that the agent will learn to avoid these state regions. If the violation degree reaches 1.0 the episode is terminated.

Hard Constraints:

With the above concept, also hard constraints that directly terminate an episode without any “danger”-region can be modeled. Then, the violation degree of the constraint directly changes from 0.0 to 1.0, if a violation occurs.

Parameters:
  • limit_constraints (list(str)/'all_states') –

    Shortcut parameter to pass all states that limits shall be observed.
    • list(str): Pass a list with state_names and all of the states will be observed to stay within

      their limits.

    • ’all_states’: Shortcut for all states are observed to stay within the limits.

  • additional_constraints (list(Constraint/callable)) – Further constraints that shall be monitored. These have to be initialized first and passed to the ConstraintMonitor. Alternatively, constraints can be defined as a function that takes the current state and returns a float within [0.0, 1.0].

  • merge_violations (‘max’/’product’/callable(*violation_degrees) -> float) –

    Function to merge all single violation degrees to a total violation degree.

    • ’max’: Take the maximal violation degree as total violation degree.

    • ’product’: Calculates the total violation degree as one minus the product of one minus all single

      violation degrees.

    • callable(*violation_degrees) -> float: User defined function to calculate the total violation.

check_constraints(state: ndarray)[source]

Function to check and merge all constraints.

Parameters:

state (ndarray(float)) – The current environments state.

Returns:

The total violation degree in [0,1]

Return type:

float

property constraints

Returns the list of all constraints the ConstraintMonitor observes.

set_modules(ps: PhysicalSystem)[source]

The PhysicalSystem of the environment is passed to save important parameters like the index of the states.

Parameters:

ps (PhysicalSystem) – The PhysicalSystem of the environment.