Project-MONAI/tutorials

Classification backbone with Vit results in argument 'input' (position1) must be Tensor, not tuple

kavmar opened this issue · 1 comments

Hi,

I am trying to use ViT as follows:

net = monai.networks.nets.ViT(spatial_dims=2, in_channels=1, img_size=(400, 400), proj_type='conv', patch_size=(64, 64),
num_classes=6, classification=True, post_activation='0').to(device)

but I am running into the same issue as reported here: #464

return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
TypeError: cross_entropy_loss(): argument 'input' (position 1) must be Tensor, not tuple

It has been concluded that the API will be enhanced by hidden_states_out, but I do not see it implemented - apparently due to design.

MONAI version: 1.3.0
Pytorch version: 2.1.1+cu121

Thanks for advice

Hi @kavmar, I think you can take outputs[0] for the loss instead of just outputs.
#464 (comment)

Hope it helps, thanks!