Reward Functions

Available Reward Functions:

Reward Function Base Class

class gym_electric_motor.core.RewardFunction[source]

The abstract base class for reward functions in gym electric motor environments.

The reward function is called once per step and returns reward for the current time step.

reward_range

Defining lowest and highest possible rewards.

close()[source]: Called, when the environment is closed to store logs, close files etc.

reset(initial_state=None, initial_reference=None)[source]

This function is called by the environment when reset.

Inner states of the reward function can be reset here, if necessary.

Parameters:

initial_state (ndarray(float)) – Initial state array of the Environment
initial_reference (ndarray(float)) – Initial reference array of the environment.

reward(state, reference, k=None, action=None, violation_degree=0.0)[source]

Reward calculation. If limits have been violated the reward is calculated with a separate function.

Parameters:

state (ndarray(float)) – Environments state array.
reference (ndarray(float)) – Environments reference array.
k (int) – Systems momentary time-step
action (element of action space) – The previously taken action.
violation_degree (float in [0.0, 1.0]) – Degree of violation of the constraints. 0.0 indicates that all constraints are complied. 1.0 indicates that the constraints have been so much violated, that a reset is necessary.

Returns:

Reward for this state, reference, action tuple.

Return type:

float

reward_range = (-inf, inf)

Lower and upper possible reward

set_modules(physical_system, reference_generator, constraint_monitor)[source]

Setting of the physical system, to set state arrays fitting to the environments states

Parameters:

physical_system (PhysicalSystem) – The physical system of the environment
reference_generator (ReferenceGenerator) – The reference generator of the environment.
constraint_monitor (ConstraintMonitor) – The constraint monitor of the environment.