Approx training duration?
franz101 opened this issue · 0 comments
franz101 commented
First of all, amazing paper. Was one of the most exciting things I read this year.
Was just wondering as the paper states v3-8 TPUs. What was the training duration on long / short text. Thanks for sharing the paper and code. Truly amazing