AI-Commandos/LLaMa2lang

combine_checkpoints.py translated dataset path

Closed this issue · 4 comments

It looks like readme file should be changed regarding the combine_checkpoints.py path for the translated oasst dataset.
The instructions for combine_checkpoints.py script describe the path to translated dataset like this:

Screenshot_20240103_092049

but my checkpoints folder doesn't have the language part mentioned in readme. It looks like this:

Screenshot_20240103_092726

Now when I run python3 combine_checkpoints.py /checkpoints/ it works fine but when I add the language part like nl from the docs, it fails.

Err what command did you run exactly because for me it looks like this:
Untitled

Which is the direct result of:
!python LLaMa2lang/translate_oasst.py hi /content/drive/MyDrive/oasst_checkpoints/hi 200 20

The structure should be: target_language/fold/from_source_language/actual_checkpoint_files. You seem to be missing both the target language and fold folders, which get created here https://github.com/UnderstandLingBV/LLaMa2lang/blob/main/translate_oasst.py#L115

It's all good, I get it now:) The way I read the docs is I should use [checkpoint] folder for the first step then [checkpoint] + [target_lang] for the 2nd step but both are really the same folder.

@mpazdzioch Even i got the same structure. What did you do to get back to the original format ?

You need to make sure you pass the right folders on, example:

python LLaMa2lang/combine_checkpoints.py ./oasst_checkpoints/pl ./oasst_checkpoints/pl_out
python LLaMa2lang/create_thread_prompts.py ./oasst_checkpoints/pl_out "test" ./oasst_checkpoints/pl_threads