QwenLM/AutoIF

Open source dataset

yuanzhiyong1999 opened this issue · 2 comments

It is mentioned in the paper that the SFT and DPO datasets built based on AUTOIF will be open-sourced together with Qwen2-72B. May I ask where they are?

May I ask if each line in seed_instruction.txt and augment_instructions.txt corresponds to each other? Is the data in augment an enhanced version of the corresponding line in seed?
@dongguanting

May I ask if each line in seed_instruction.txt and augment_instructions.txt corresponds to each other? Is the data in augment an enhanced version of the corresponding line in seed? @dongguanting

Thank you for your attention, As shown in the prompt example in our RFT.py:

image

we will have the supervision model directly generate 50 augmented instructions at a time from the seed data, so there is no one-to-one correspondence.