released data?
yucc-leon opened this issue · 2 comments
yucc-leon commented
In the tech report "Our models, codes, and data are released at https://github.com/InternLM/InternLM-Math."; what data have you released?
BTW, the Plus series improves a lot on benchmarks; is this gained by different data processing methods or just more tokens?
objecti0n commented
The improvement comes from the new pre-training corpus and expert-iterated SFT dataset.
yucc-leon commented
Thank you all the same.
The improvement comes from the new pre-training corpus and expert-iterated SFT dataset.改进来自新的预训练语料库和专家迭代的 SFT 数据集。