difference between vision_lstm and vision_lstm2?

Question

dongzhuoyao opened this issue 6 months ago · 1 comments

Answer 1 · 2024-07-04T21:28:19.000Z

Updated the README for easier visibility of this info.

Changes:

Conv2d with kernelsize 3 instead of causal Conv1d before q and k
biases in layernorms and projection layers
concatenate first and last token instead of average them
pre-trained models are pre-trained on 192x192 resolution followed by a short fine-tuning on 224x224