How many tokens are generally trained in total?
lengyueyang opened this issue · 1 comments
lengyueyang commented
Thanks for such great work, the article mentions continuing pre-training in 6T of data, how many tokens have been trained approximately corresponding to the loaded checkpoint?
guoday commented
We have already described this in section 3.3 of the paper, please refer to it in Table 2.