alasdairtran/transform-and-tell

Why the t_total is 437600?

reroze opened this issue · 1 comments

In config.yaml, the train's instances_per_epoch is 65536 and batch_size is 16, after 100 epochs, it seems that only 409600 batches used during the training stage. So the t_total might be 409600?

The batch size is actually not fixed. There's another parameter maximum_samples_per_batch that ensures that a batch doesn't contain more than 16384 tokens (to avoid OOM error). So some batches will have fewer than 16 samples.

On average, there are about 4376 batches per epoch (this number was manually observed).