constantinpape/torch-em

Enable UNETR for dynamic input shape

Opened this issue · 0 comments

Currently our UNETR implementation has a fixed input shape, see https://github.com/constantinpape/torch-em/blob/main/torch_em/model/unetr.py#L64.

This is due to the fixed input shape of the underlying VIT implementation (either TIMM/MAE or SAM). However, this is only due to the fixed positional encoding size. Otherwise the transformer could process sequences of arbitrary length (and consequently images of dynamic shape as long as their divisible by the patch shape).

It would be nice to update this so that arbitrary input shapes are supported. But this is currently not a priority. cc @anwai98.