Bad generation after training/finetuning

Question

Bad generation after training/finetuning

dardodel opened this issue a year ago · 11 comments

We train the model using our own dataset following the instruction provided in this repo and the token accuracy is very high and the loss is relatively low. However, after creating the pytorch_model.bin using the provided python code, the text generation is very bad, it is like some random words coming after each other. I wonder if anyone else has similar issue or if the developers have any clue on the possible reasons. Thanks.

Answer 1 · 2023-05-27T15:14:31.000Z

Hi @dardodel, thank you for your question. We have updated our training scripts and the performance of the trained model should be largely improved. Could you try to use the new code provided and give us some feedback? Thanks!

Answer 2 · 2023-05-30T19:50:06.000Z

@yxuansu Thanks for your response and I tried the new code but still same problem. I also double checked the requirements and we have same environment (hopefully). One thing that I paid more attention to is that when we load the trained model (the one that we finetuned and converted to pytorch_model.bin) we get this warning message, while we don't when we load publishe OpenAlpaca model. Note that the message is way longer than this and I only copied a part of it. I can copy the whole message if needed.

"
Some weights of the model checkpoint at ./ckpt/openalpaca_7b/7bt_preview were not used when initializing LlamaForCausalLM: ['model.model.layers.13.self_attn.rotary_emb.inv_freq', 'model.model.layers.13.post_attention_layernorm.weight', 'model.model.layers.2.post_attention_layernorm.weight', 'model.model.layers.13.mlp.up_proj.weight', 'model.model.layers.3.mlp.up_proj.weight', 'model.model.layers.31.self_attn.q_proj.weight',

....,

**This IS expected if you are initializing LlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

This IS NOT expected if you are initializing LlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Some weights of LlamaForCausalLM were not initialized from the model checkpoint at ./ckpt/openalpaca_7b/7bt_preview and arenewly initialized:**

['model.layers.1.post_attention_layernorm.weight', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.14.post_attention_layernorm.weight', 'model.layers.20.self_attn.k_proj.weight',

....

...."

Answer 3 · 2023-06-09T14:28:40.000Z

I also have the same problem as Dardodel.
By the way, is it correct that I get the following log record while training the model: [!]Model Size: 0.000000B?

Answer 4 · 2023-06-13T05:41:18.000Z

@dardodel Hello. I'm sorry for not getting back to you sooner. After reading your warning about loading the parameters, I noticed that the names of the parameters in your fine-tuned checkpoints begin with the prefix model.model, but the names of these parameters start with the prefix model.. Thus, you didn't load the model parameters successfully, and the random initialized parameters are used for the inference.

This problem could be fixed by running the make_shards.py scripts, which are described in the README.md carefully. Could you please follow the tutorial in README.md and run the make_shards.py script to generate the huggingface version checkpoint that can be directly loaded successfully?

Answer 5 · 2023-06-13T05:42:50.000Z

@chenhl0810 Sorry for the late response. It is not correct because the size should be 7B for the OpenLLaMa-7B model. Could you please check that the OpenLLaMa checkpoint is downloaded correctly?

Answer 6 · 2023-06-13T09:26:03.000Z

Thanks for your reply.
Could you help to confirm that the setting of OpenLLaMa-7B is set in this file, as shown as the figure below ?

Answer 7 · 2023-06-13T10:07:32.000Z

Hi @chenhl0810, thank you for spotting this! This is actually a typo (it is our trained openalpaca model). We have made corrections to the scripts ([3b script] and [7b script]), pointing to the correct OpenLLaMA checkpoints.

Answer 8 · 2023-06-14T02:03:05.000Z

@dardodel Hello. I'm sorry for not getting back to you sooner. After reading your warning about loading the parameters, I noticed that the names of the parameters in your fine-tuned checkpoints begin with the prefix model.model, but the names of these parameters start with the prefix model.. Thus, you didn't load the model parameters successfully, and the random initialized parameters are used for the inference.

This problem could be fixed by running the make_shards.py scripts, which are described in the README.md carefully. Could you please follow the tutorial in README.md and run the make_shards.py script to generate the huggingface version checkpoint that can be directly loaded successfully?

@gmftbyGMFTBY

Thanks for your response. But I did nothing other than the instruction provided in the readme. After finetuning with DeepSpeed, I ran the zero_to_fp32.py to get the pytorch version of weights. My assumption was that this bin file is enough and the sharding step is just optional in case we want to break down the pytorch_model.bin into smaller pieces. Now, do I have to run the sharding? Thanks.

Answer 9 · 2023-06-14T10:22:47.000Z

Hi @dardodel, may I kindly ask have you experimented with our newly updated codebase? It would be great if you can try the new codebase and I think the sharding step is not necessary once you already have a large and unified pytorch.bin file.

Answer 10 · 2023-06-15T14:55:22.000Z

Hi @dardodel, may I kindly ask have you experimented with our newly updated codebase? It would be great if you can try the new codebase and I think the sharding step is not necessary once you already have a large and unified pytorch.bin file.

Yes, I tried the new code as well. The warning message I shared above was from the new code.

Answer 11 · 2023-06-17T07:11:55.000Z

Hi @dardodel, may I kindly ask have you experimented with our newly updated codebase? It would be great if you can try the new codebase and I think the sharding step is not necessary once you already have a large and unified pytorch.bin file.

Yes, I tried the new code as well. The warning message I shared above was from the new code.

This is kinda strange. I just rerun on my side the code seems to work fine. Can you run the sharding operation to see what happens?