vimalabs/VIMA

t5-large instead of t5-base

Closed this issue · 2 comments

I want to use t5-large instead of t5-base. What parts of the code should I change?
I have changed the following lines:

tokenizer = Tokenizer.from_pretrained("t5-large") self.t5 = T5EncoderModel.from_pretrained("t5-large") model = AutoModel.from_pretrained("t5-large")
But I am got the following error:

size mismatch for prompt_embedding._embed_layer.weight: copying a param with shape torch.Size([32128, 768]) from checkpoint, the shape in current model is torch.Size([32128, 1024]). size mismatch for t5_prompt_encoder.t5.shared.weight: copying a param with shape torch.Size([32128, 768]) from checkpoint, the shape in current model is torch.Size([32128, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.embed_tokens.weight: copying a param with shape torch.Size([32128, 768]) from checkpoint, the shape in current model is torch.Size([32128, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.0.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.0.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.0.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.0.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight: copying a param with shape torch.Size([32, 12]) from checkpoint, the shape in current model is torch.Size([32, 16]). size mismatch for t5_prompt_encoder.t5.encoder.block.0.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.0.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.0.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for t5_prompt_encoder.t5.encoder.block.0.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.1.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.1.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.1.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.1.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.1.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.1.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.1.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for t5_prompt_encoder.t5.encoder.block.1.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.2.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.2.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.2.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.2.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.2.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.2.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.2.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for t5_prompt_encoder.t5.encoder.block.2.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.3.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.3.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.3.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.3.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.3.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.3.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.3.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for t5_prompt_encoder.t5.encoder.block.3.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.4.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.4.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.4.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.4.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.4.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.4.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.4.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for t5_prompt_encoder.t5.encoder.block.4.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.5.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.5.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.5.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.5.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.5.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.5.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.5.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for t5_prompt_encoder.t5.encoder.block.5.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.6.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.6.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.6.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.6.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.6.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.6.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.6.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for t5_prompt_encoder.t5.encoder.block.6.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.7.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.7.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.7.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.7.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.7.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.7.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.7.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for t5_prompt_encoder.t5.encoder.block.7.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.8.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.8.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.8.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.8.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.8.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.8.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.8.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for t5_prompt_encoder.t5.encoder.block.8.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.9.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.9.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.9.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.9.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.9.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.9.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.9.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for t5_prompt_encoder.t5.encoder.block.9.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.10.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.10.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.10.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.10.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.10.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.10.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.10.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for t5_prompt_encoder.t5.encoder.block.10.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.11.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.11.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.11.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.11.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.11.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.11.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for t5_prompt_encoder.t5.encoder.block.11.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for t5_prompt_encoder.t5.encoder.block.11.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for t5_prompt_encoder.t5.encoder.final_layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).

thanks for your attention.

Thank you for your interest in our work! T5-large model uses an embedding dim of 1024, instead of 768 that T5-base uses. Therefore, loading the provided VIMA ckpts will raise size mismatch error.

That being said, there are still several workarounds. For example, you can avoid loading weight for t5_prompt_encoder. But pay attention to this adapter layer which reconciles different embedding sizes. Let me know if you have further questions.

I'm closing this issue. Feel free to reopen if you encounter further questions.