combine_checkpoints.py translated dataset path
Closed this issue · 4 comments
It looks like readme file should be changed regarding the combine_checkpoints.py
path for the translated oasst dataset.
The instructions for combine_checkpoints.py
script describe the path to translated dataset like this:
but my checkpoints
folder doesn't have the language part mentioned in readme. It looks like this:
Now when I run python3 combine_checkpoints.py /checkpoints/
it works fine but when I add the language part like nl
from the docs, it fails.
Err what command did you run exactly because for me it looks like this:
Which is the direct result of:
!python LLaMa2lang/translate_oasst.py hi /content/drive/MyDrive/oasst_checkpoints/hi 200 20
The structure should be: target_language/fold/from_source_language/actual_checkpoint_files. You seem to be missing both the target language and fold folders, which get created here https://github.com/UnderstandLingBV/LLaMa2lang/blob/main/translate_oasst.py#L115
It's all good, I get it now:) The way I read the docs is I should use [checkpoint]
folder for the first step then [checkpoint] + [target_lang]
for the 2nd step but both are really the same folder.
@mpazdzioch Even i got the same structure. What did you do to get back to the original format ?
You need to make sure you pass the right folders on, example:
python LLaMa2lang/combine_checkpoints.py ./oasst_checkpoints/pl ./oasst_checkpoints/pl_out
python LLaMa2lang/create_thread_prompts.py ./oasst_checkpoints/pl_out "test" ./oasst_checkpoints/pl_threads