Create training README

Question

Opened this issue a year ago · 0 comments

Batch size:

forward: number of forward trajectories to include in the training batch. These are on-policy trajectories possibly with random actions (if random_action_prob > 0) or with a tempered policy if temperature < 1.0
train: number of backward trajectories to include in the training batch, sampled (backwards) from data points in a "training set"
replay: number of backward trajectories to include in the training batch, sampled (backwards) from data points in the replay buffer.

The total number of trajectories in the training batch is the sum of the above.