Seemly not freeze llama

Question

Seemly not freeze llama

jinyutong23 opened this issue a year ago · 5 comments

When I train LAMM on NTX 3090 24G with 1 batch size be setted, the GPU memory fill up.
I check out my hyper-parameter, it still not solve the abnormal. So I look your code and not found the operation about freeze model excluding freeze visual encoder.
I don't know the reason that the strange phenomenon GPU memory brake out. Should I add a freeze module in precent code?

Answer 1 · 2023-07-03T06:04:33.000Z

Hi, thanks for your interest in our work. May I know the size of vicuna model you used?

Answer 2 · 2023-07-03T06:09:56.000Z

When I train LAMM on NTX 3090 24G with 1 batch size be setted, the GPU memory fill up. I check out my hyper-parameter, it still not solve the abnormal. So I look your code and not found the operation about freeze model excluding freeze visual encoder. I don't know the reason that the strange phenomenon GPU memory brake out. Should I add a freeze module in precent code?

I think explicitly freeze llama_model is not suitable here, as the lora parameters are also included in self.llama_model.named_parameters.
Since we use PEFT package to add LoRA to LLM, I think the non-LoRA parameters has been freezed by the function mark_only_lora_as_trainable. Please check it out.

If you're using the 13B vicuna model, maybe you can try to use the 7B one.

Answer 3 · 2023-07-03T06:10:09.000Z

@wangjiongw We set vicuna-7B for llama pretrain. And something strange that although i add freeze operation as mentioned above, the GPU memory still fill up.

Answer 4 · 2023-07-03T06:20:34.000Z

When I train LAMM on NTX 3090 24G with 1 batch size be setted, the GPU memory fill up. I check out my hyper-parameter, it still not solve the abnormal. So I look your code and not found the operation about freeze model excluding freeze visual encoder. I don't know the reason that the strange phenomenon GPU memory brake out. Should I add a freeze module in precent code?

I think explicitly freeze llama_model is not suitable here, as the lora parameters are also included in self.llama_model.named_parameters. Since we use PEFT package to add LoRA to LLM, I think the non-LoRA parameters has been freezed by the function mark_only_lora_as_trainable. Please check it out.

If you're using the 13B vicuna model, maybe you can try to use the 7B one.

Thank your anwser! The llama pretrain not necessary to freeze again. However, I still feel confuse about the GPU memeory broken out under 7B vicuna pretrain.

Answer 5 · 2023-07-03T08:06:56.000Z

When I train LAMM on NTX 3090 24G with 1 batch size be setted, the GPU memory fill up. I check out my hyper-parameter, it still not solve the abnormal. So I look your code and not found the operation about freeze model excluding freeze visual encoder. I don't know the reason that the strange phenomenon GPU memory brake out. Should I add a freeze module in precent code?

I think explicitly freeze llama_model is not suitable here, as the lora parameters are also included in self.llama_model.named_parameters. Since we use PEFT package to add LoRA to LLM, I think the non-LoRA parameters has been freezed by the function mark_only_lora_as_trainable. Please check it out.
If you're using the 13B vicuna model, maybe you can try to use the 7B one.

Thank your anwser! The llama pretrain not necessary to freeze again. However, I still feel confuse about the GPU memeory broken out under 7B vicuna pretrain.

As we trained our model on A100 previously, this is a quite new problem. I tested on our device and found that 7B model requires about 30GB GPU memory.
We are also trying to optimize GPU memory consumption and enable training on devices such as 3090, please stay tuned.
Maybe you can try to apply INT8 to LLM parameters to save memory consumption first. Contributions are also welcomed and appreciated.