Classification backbone with Vit results in argument 'input' (position1) must be Tensor, not tuple

Question

Classification backbone with Vit results in argument 'input' (position1) must be Tensor, not tuple

kavmar opened this issue 6 months ago · 1 comments

Hi,

I am trying to use ViT as follows:

net = monai.networks.nets.ViT(spatial_dims=2, in_channels=1, img_size=(400, 400), proj_type='conv', patch_size=(64, 64),
num_classes=6, classification=True, post_activation='0').to(device)

but I am running into the same issue as reported here: #464

return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
TypeError: cross_entropy_loss(): argument 'input' (position 1) must be Tensor, not tuple

It has been concluded that the API will be enhanced by hidden_states_out, but I do not see it implemented - apparently due to design.

MONAI version: 1.3.0
Pytorch version: 2.1.1+cu121

Thanks for advice

Answer 1 · 2024-01-15T03:27:04.000Z

Hi @kavmar, I think you can take outputs[0] for the loss instead of just outputs.
#464 (comment)

Hope it helps, thanks!