google-research/text-to-text-transfer-transformer

Shape mismatch error while loading the pretrained model

pragna96 opened this issue · 1 comments

THE ISSUES SECTION IS ONLY FOR FILING BUGS. PLEASE ASK YOUR QUESTION ON THE DISCUSSION TAB.
I get a shape mismatch error when I try to do the finetuning.
The same code was working perfectly working on collab, but when i try to run it on my GPU linux server, its giving error despite copying the same pretrained model files and gin files.

The error is as following :

ValueError: Shape of variable decoder/block_000/layer_000/SelfAttention/k:0 ((512, 512)) doesn't match with shape of tensor decoder/block_000/layer_000/SelfAttention/k ([1024, 4096]) from checkpoint reader.

The operative config file specifies the pre-trained checkpoint path, so either you're overriding that, or you're trying to use a directory that was already used to save checkpoints for a differently-sized model, and TF is trying to load those checkpoints (in which case you should just create a new empty model directory for saving your fine-tuned checkpoints).