Error in FLOPS Calculation
passaglia opened this issue · 0 comments
passaglia commented
There's a bug in the GPT-NeoX flops calculation here:
Line 104 in a2b2020
The term proportional to the vocab_size, flops_calc2
, should share all the same prefactors as flops_calc_1
. See page 12 of 2104.04473
Since this term scales inversely with both hidden_dim and num_layers, it is more significant for small models and less important for large models: for Pythia-70m the error is roughly 50257/(16 \times 6 \times 512) = 102% while for GPT-NeoX-20B it is only 50257/(16 \times 44 \times 6144) = 1.2%.
This bug seems to have been introduced only ~3 months ago in #1044 so it may not have had an impact on, e.g., any tests done while training Pythia.