Open source dataset

Question

Open source dataset

yuanzhiyong1999 opened this issue 6 months ago · 2 comments

yuanzhiyong1999 commented 6 months ago

It is mentioned in the paper that the SFT and DPO datasets built based on AUTOIF will be open-sourced together with Qwen2-72B. May I ask where they are?

Answer 1 · 2024-07-15T10:23:07.000Z

May I ask if each line in seed_instruction.txt and augment_instructions.txt corresponds to each other? Is the data in augment an enhanced version of the corresponding line in seed?
@dongguanting

Answer 2 · 2024-07-15T10:33:32.000Z

May I ask if each line in seed_instruction.txt and augment_instructions.txt corresponds to each other? Is the data in augment an enhanced version of the corresponding line in seed? @dongguanting

Thank you for your attention, As shown in the prompt example in our RFT.py：

we will have the supervision model directly generate 50 augmented instructions at a time from the seed data, so there is no one-to-one correspondence.