Gradient issue

Question

Gradient issue

TonyXuQAQ opened this issue a year ago · 9 comments

Hi, after going through the training code, it seems that the gradient is not properly backpropagated. It seems that all projector layers mm_projectorare called within torch.no_grad (i.e., call_1, call_2). If so, it means the projector layer is not trained at all, right? Is this a typo in the released code or an error?

Answer 1 · 2023-09-16T02:30:39.000Z

Can you share the error output and training configuration file?

Answer 2 · 2023-09-16T07:07:41.000Z

There is no error. I just used the raw code of this repo. I mean, the projector mm_projector layer seems not been trained properly in valley/model/valley.py. All mm_projector are wrapped in torch.no_grad so that the projector will not be trained, since the gradient is blocked within torch.no_grad.

Answer 3 · 2023-09-21T02:23:24.000Z

In the file train.py, You can set whether need to update the projector.

Answer 4 · 2023-09-21T02:52:56.000Z

But projectors are wrapped inside torch.no_grad. So the gradient cannot pass the projector, i.e., the projector is not trained. And you did not use this layer elsewhere. I wonder how you trained this projector.

Answer 5 · 2023-09-21T13:08:17.000Z

But projectors are wrapped inside torch.no_grad. So the gradient cannot pass the projector, i.e., the projector is not trained. And you did not use this layer elsewhere. I wonder how you trained this projector.

@TonyXuQAQ I find the projector is not wrapped inside 'torch.no_grad' in the original code of this repo, as follows:
in
[https://github.com/RupertLuo/Valley/blob/8da73a9551cd9ce520c47f7c3f508fdfc387f4f8/valley/model/valley.py].
I guess the "bug" is caused by reorganizing the codes. And the projector should be outside the 'torch.no_grad' as the released models are trained with tuning projector.

Answer 6 · 2023-09-21T14:10:20.000Z

Thanks for the information.

During finetuning, I also noticed that your current version code cannot load VideoChat-instruct-11K normally. Because LLaVA-instruct-150K's label is organized as "{'human":... "gpt":...}", but VideoChat-instruct-11K's label is organized as "{'q':..., 'a':...}". These two datasets have different label formats. But your code did not do format transformation. I guess you missed the label pre-processing code.

I don't know how, based on your llama-2-pretrain weights, I finetuned valley on the above two datasets and the results are very bad. I will refer to the early commits of this repo for debugging.

Answer 7 · 2023-09-21T14:19:33.000Z

So may I know which commit is used to train the provided valley-2-7b? I just want to re-implement the performance of the provided checkpoints

Answer 8 · 2023-09-21T14:44:39.000Z

Thanks for the information.

During finetuning, I also noticed that your current version code cannot load VideoChat-instruct-11K normally. Because LLaVA-instruct-150K's label is organized as "{'human":... "gpt":...}", but VideoChat-instruct-11K's label is organized as "{'q':..., 'a':...}". These two datasets have different label formats. But your code did not do format transformation. I guess you missed the label pre-processing code.

I don't know how, based on your llama-2-pretrain weights, I finetuned valley on the above two datasets and the results are very bad. I will refer to the early commits of this repo for debugging.

LLaVA-instruct-150k should be able to load. For videochat-11k, you need to convert the format to LLaVA-instruct-150k.

Answer 9 · 2023-09-21T14:46:30.000Z

So may I know which commit is used to train the provided valley-2-7b? I just want to re-implement the performance of the provided checkpoints

Thank you for your continued attention to this project. I will synchronize it to the code that can be perfectly trained as soon as possible.