Reward Functions
Reward Function Base Class
- class gym_electric_motor.core.RewardFunction[source]
The abstract base class for reward functions in gym electric motor environments.
The reward function is called once per step and returns reward for the current time step.
- reward_range
Defining lowest and highest possible rewards.
- Type:
Tuple(float, float)
- reset(initial_state=None, initial_reference=None)[source]
This function is called by the environment when reset.
Inner states of the reward function can be reset here, if necessary.
- Parameters:
initial_state (ndarray(float)) – Initial state array of the Environment
initial_reference (ndarray(float)) – Initial reference array of the environment.
- reward(state, reference, k=None, action=None, violation_degree=0.0)[source]
Reward calculation. If limits have been violated the reward is calculated with a separate function.
- Parameters:
state (ndarray(float)) – Environments state array.
reference (ndarray(float)) – Environments reference array.
k (int) – Systems momentary time-step
action (element of action space) – The previously taken action.
violation_degree (float in [0.0, 1.0]) – Degree of violation of the constraints. 0.0 indicates that all constraints are complied. 1.0 indicates that the constraints have been so much violated, that a reset is necessary.
- Returns:
Reward for this state, reference, action tuple.
- Return type:
float
- reward_range = (-inf, inf)
Lower and upper possible reward
- Type:
Tuple(int,int)
- set_modules(physical_system, reference_generator, constraint_monitor)[source]
Setting of the physical system, to set state arrays fitting to the environments states
- Parameters:
physical_system (PhysicalSystem) – The physical system of the environment
reference_generator (ReferenceGenerator) – The reference generator of the environment.
constraint_monitor (ConstraintMonitor) – The constraint monitor of the environment.