pytorch/examples

vision-transformer problem report

ChenDaiwei-99 opened this issue ยท 4 comments

If you wanna contribute some fixes would be happy to merge

Sure, will do it in a few days :)

I can confirm this example is badly broken. I added some code to compare individual labels to predictions and discovered the forward pass of ViT always returns the same tensor. No matter the input. The tensor it returns is different each time I run it, even if I load the same weights from the save file and don't do any training. It's no wonder it can't do better than 2.3. Always giving the same prediction should accidentally hit on the correct label about as often as random guessing.

I added the printout of accuracy in the code, but during the training process, the accuracy does not improve, and the loss does not converge, even after training for many epochs. Does this model really work? I think this model has serious problems. I hope to get an answer, this is very important to me