Why the code has the TODO: release this comment?

Question

Why the code has the TODO: release this comment?

Opened this issue 4 months ago · 4 comments

Thanks for your awesome works.
BreadcrumbsARM/Finetuning /models_mamba.py 288 line.
There is a TODO in the code: release this comment. Is the code not yet finished?
Why can't the finetune test code be reproduced? Have you open-sourced all the finetune parts?

Answer 1 · 2024-07-16T23:18:50.000Z

Hi, we inhrent the TODO from Vim (https://github.com/hustvl/Vim/blob/main/vim/models_mamba.py#L312), but it does not affect the training or inference. Do you install the correct package like causal-conv1d==1.1.2.post1 & mamba-ssm 1.1.1 from ~/Mamba/Vim/mamba-1p1p1?

Answer 2 · 2024-07-17T02:39:41.000Z

We can also provide the finetuning log (fine-tuning with the code in this repo without any modification) if it helps you double-check/verify your fine-tuning process.

Answer 3 · 2024-07-23T13:42:01.000Z

Thank you for your patience. After comparing the original code, I found that this to do list is indeed original.
But in the process of further checking the code, I found that ARM uses Deocder, and the Query in Decoder is a random variable constructed by self.ar_token = nn.Parameter(torch.zeros(1, 1, self.dec_embed_dim)).
I observed that ARM wants to use the value calculated by crossattention with this Query to further calculate the loss.
My question is, is this calculation method related to autoregressive training? How can we understand this implementation with the autoregressive training?
Please forgive me for still having such doubts after reading your article. I really hope to get your explanation.
Thanks again.

Answer 4 · 2024-07-24T08:23:29.000Z

In fact, count+=1 is used repeatedly in your Deocder part.
This is equivalent to using the previous query in each Deocder block. I don't understand whether such calculation will bring additional benefits?