deepseek-ai/DeepSeek-Coder-V2

How many tokens are generally trained in total?

lengyueyang opened this issue · 1 comments

Thanks for such great work, the article mentions continuing pre-training in 6T of data, how many tokens have been trained approximately corresponding to the loaded checkpoint?

We have already described this in section 3.3 of the paper, please refer to it in Table 2.