NX-AI/vision-lstm

difference between vision_lstm and vision_lstm2?

dongzhuoyao opened this issue · 1 comments

difference between vision_lstm and vision_lstm2?

Updated the README for easier visibility of this info.

Changes:

  • Conv2d with kernelsize 3 instead of causal Conv1d before q and k
  • biases in layernorms and projection layers
  • concatenate first and last token instead of average them
  • pre-trained models are pre-trained on 192x192 resolution followed by a short fine-tuning on 224x224