[Question] dataset size?
choco9966 opened this issue · 4 comments
Could you provide the sizes of the data generated by Python codes 1 to 9 when using GPT-4 and Qwen2-72B?
The data generated from running 1_RFT.py was 10,000, but after running 3_cross_validation, it was reduced to 700. I am wondering if this is due to a code issue or if it's expected behavior. Sharing the data sizes of each process would be very helpful for my work. Thank you.
Hi !
In fact, in the main text, we have already presented the total number of augmented samples under different supervision models and the final number of quality-filtered SFT and DPO samples. We will provide a more detailed breakdown of sample numbers in the future.
I believe that the higher number of samples filtered by the executor for quality may indicate issues with the quality of the instructions you generated. You can compare the sample quality before and after quality validation, and there should be a noticeable improvement.
Could you provide the sizes of the data generated by Python codes 1 to 9 when using GPT-4 and Qwen2-72B?
The data generated from running 1_RFT.py was 10,000, but after running 3_cross_validation, it was reduced to 700. I am wondering if this is due to a code issue or if it's expected behavior. Sharing the data sizes of each process would be very helpful for my work. Thank you.
Hello!
Are you using GPT4 as a supervision model? In fact, when I used the Qwen2 72B as a supervision model, the pass rate was only 4.5% in the third step.
This has something to do with the prompt. By adjusting the prompt, the quality of the generated instructions can be improved.
Currently adjusting the prompt to try to reproduce the effect in the paper...
In fact, in the main text, we have already presented the total number of augmented samples under different supervision models and the final number of quality-filtered SFT and DPO samples. We will provide a more detailed breakdown of sample numbers in the future.
I believe that the higher number of samples filtered by the executor for quality may indicate issues with the quality of the instructions you generated. You can compare the sample quality before and after quality validation, and there should be a noticeable improvement.
hello, is Qwen2-72B refers to Qwen2-72B-Instruct ?
In fact, in the main text, we have already presented the total number of augmented samples under different supervision models and the final number of quality-filtered SFT and DPO samples. We will provide a more detailed breakdown of sample numbers in the future.
I believe that the higher number of samples filtered by the executor for quality may indicate issues with the quality of the instructions you generated. You can compare the sample quality before and after quality validation, and there should be a noticeable improvement.hello, is Qwen2-72B refers to Qwen2-72B-Instruct ?
It is the Qwen-instruct version, but it is our internally developed in-house version since the data produced by our AutoIF has been merged into the SFT data of Qwen-instruct.