RuntimeError: The size of tensor a (16641) must match the size of tensor b (129) at non-singleton dimension 2

Question

RuntimeError: The size of tensor a (16641) must match the size of tensor b (129) at non-singleton dimension 2

CTosXY opened this issue 7 months ago · 3 comments

Hi,Zhong! I have tested pose/CTF parameters parsing by the voxel-based backprojection script, and the output structure resembles the structure from the consensus reconstruction.
However, the training process report such an error, the following is the log. (the extracted particle were from different batchs micrographs data, whether it is matter?)
We are looking forword your reply.

runlog.txt
/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1699449181081/work/aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/bin/cryodrgn", line 8, in
sys.exit(main())
File "/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/lib/python3.9/site-packages/cryodrgn/main.py", line 72, in main
args.func(args)
File "/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/lib/python3.9/site-packages/cryodrgn/commands/train_vae.py", line 910, in main
loss, gen_loss, kld = train_batch(
File "/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/lib/python3.9/site-packages/cryodrgn/commands/train_vae.py", line 360, in train_batch
y = preprocess_input(y, lattice, trans)
File "/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/lib/python3.9/site-packages/cryodrgn/commands/train_vae.py", line 405, in preprocess_input
y = lattice.translate_ht(y.view(B, -1), trans.unsqueeze(1)).view(B, D, D)
File "/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/lib/python3.9/site-packages/cryodrgn/lattice.py", line 163, in translate_ht
return c * img + s * img[:, :, torch.arange(len(coords) - 1, -1, -1)]
RuntimeError: The size of tensor a (16641) must match the size of tensor b (129) at non-singleton dimension 2

Answer 1 · 2024-03-01T23:52:13.000Z

Previously, I faced this issue too and it seemed that using a batch size of 28 or 128 solved it for me.

Regards,
Raj

Answer 2 · 2024-03-03T17:53:19.000Z

Previously, I faced this issue too and it seemed that using a batch size of 28 or 128 solved it for me.

Regards, Raj

Thanks a lot! It seems work!

Answer 3 · 2024-05-16T15:49:35.000Z

We also eventually came across this issue on our end, and after some investigation found that it is caused by dataset.ImageData.__getitem__ returning a two-dimensional image of dimension DxD rather than a stack of images of dimension 1xDxD when index is of length one:

particles = self._process(self.src.images(index).to(self.device))

This can happen when the total image count is modulo one the batch size, and thus the last batch in each training epoch is of size one. Many array operations in torch and numpy will drop the singleton dimension automatically!

For now we are patching this by detecting when it happens after the fact and casting the particles to the correct dimension:

if len(particles.shape) == 2:
    particles = particles[np.newaxis, ...]

However, as also discussed at #349, the __gettitem__ method needs a rethink, either by more explicitly delineating the actions to take depending on whether the input index is a single element, sequence, slice object etc. (and writing tests for each) or by not supporting sequences as inputs (same as e.g. list Python objects) and refactoring data loading methods accordingly.

In the meantime, you can access this patch using our beta release channel, which should allow you to use the batch size you originally intended:

pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ 'cryodrgn<=3.3.2' --pre

Please let us know if you run into any other issues, and thank you for bringing this to our attention!
-Mike