Why the code has the TODO: release this comment?
Opened this issue · 4 comments
Hi, we inhrent the TODO from Vim (https://github.com/hustvl/Vim/blob/main/vim/models_mamba.py#L312), but it does not affect the training or inference. Do you install the correct package like causal-conv1d==1.1.2.post1 & mamba-ssm 1.1.1 from ~/Mamba/Vim/mamba-1p1p1?
We can also provide the finetuning log (fine-tuning with the code in this repo without any modification) if it helps you double-check/verify your fine-tuning process.
Thank you for your patience. After comparing the original code, I found that this to do list is indeed original.
But in the process of further checking the code, I found that ARM uses Deocder, and the Query in Decoder is a random variable constructed by self.ar_token = nn.Parameter(torch.zeros(1, 1, self.dec_embed_dim)).
I observed that ARM wants to use the value calculated by crossattention with this Query to further calculate the loss.
My question is, is this calculation method related to autoregressive training? How can we understand this implementation with the autoregressive training?
Please forgive me for still having such doubts after reading your article. I really hope to get your explanation.
Thanks again.
In fact, count+=1
is used repeatedly in your Deocder part.
This is equivalent to using the previous query in each Deocder block. I don't understand whether such calculation will bring additional benefits?