Hi, does it include the first stage training script?

or you're pretrained in the framework of llava code?

Hi, this code do not contain the first training script. I follow the framework of llava to finish the first stage training with some small modifications:

  1. change ViT to swin-Transformer
  2. change mm_projector from nn.Linear to conv + linear