Create training README
Opened this issue · 0 comments
alexhernandezgarcia commented
Batch size:
- forward: number of forward trajectories to include in the training batch. These are on-policy trajectories possibly with random actions (if
random_action_prob > 0
) or with a tempered policy iftemperature < 1.0
- train: number of backward trajectories to include in the training batch, sampled (backwards) from data points in a "training set"
- replay: number of backward trajectories to include in the training batch, sampled (backwards) from data points in the replay buffer.
The total number of trajectories in the training batch is the sum of the above.