RUC-GSAI/Yulan-GARDEN

Some question about this when run the code

qylen opened this issue · 3 comments

Hello, your work is so good! I'm a newcomer in this field. While I follow your guide to run the code like this , I download the mini openwebtext2 for input. when I run the code, the output file is empty. I don't know how to solve it.
image

Also, How to deal with the input dataset file like openwebtext2? you say you provide the method for data processing recipes, but I can't find it.

Dear qyleni,

Thanks for your attention to our work.

In terms of your first question, as no further details about the errors of running Yulan-GARDEN are provided, I suppose the mini openwebtext2 you downloaded is not in the format of JSONL. To specify the {{input_ext}} in config file, you are supposed to serve each data point in one line, while a json dict containing {{input_text_key}} field in one line.

As for your second question, we provide our data processing recipe for openwebtext2 to reproduce the experiment result reported in our paper can be found in here.

I wish this response could help your figure out your issues.

Emanual20, Yulan-GARDEN Team

As no further comments are addressed, this issue will be closed. Please feel free to reopen it if you have any other questions. We are looking forward to your feedback.