文章中似乎没有表述清楚SFT模型使用的数据集

Question

Closed this issue 8 months ago · 2 comments

The SFT dataset encompasses many routine tasks and can be substituted with an open-source instruction finetuning dataset.

我的理解是SFT模型实际上是指令微调模型(ChatGLM3-32B)，并没有针对数学数据集做监督微调吗？

Answer 1 · 2024-05-15T02:47:02.000Z

另外，后面的RFT和DPO过程，指令是从相同的数据集中抽取的吗，还是做了划分？

Answer 2 · 2024-05-22T07:24:31.000Z

感谢您指出论文写作不清晰的地方，我们会在接下来的更新版本中补全这些细节