Hands-on-MLLM 实现MLLM各种基础组件,目前集中于Transformer和Visual Encodec TODO Transformer(Training task: Sequence repeater) BERT GPT ViT BLIP-2 (Q-Former) DiT VQ-VAE ...