tangqiaoyu/ToolAlpaca

Training Dataset

mattgithub1919 opened this issue · 1 comments

Hi, thanks for the great work and making it public. It is possible to publicly share the training dataset as well? Currently we can use build_dataset.py to make our own training dataset but it's not straightforward. I'm not sure build_dataset.py can work for any tool without any modification. It will be great if you can publish your dataset so that we can reproduce the results in the paper. Thanks.

Hi, thanks for your interest in our work. As long as you format your tool similar to our data format, you can use build_dataset.py to construct a complete training dataset. However, due to differences in SFT training scripts, build_dataset.py might require some adjustments.

In our code, the output format is List[[texts list, trainable list], ...], which is compatible with our training script at train.py.