RuntimeError: The size of tensor a (16641) must match the size of tensor b (129) at non-singleton dimension 2
CTosXY opened this issue · 3 comments
Hi,Zhong! I have tested pose/CTF parameters parsing by the voxel-based backprojection script, and the output structure resembles the structure from the consensus reconstruction.
However, the training process report such an error, the following is the log. (the extracted particle were from different batchs micrographs data, whether it is matter?)
We are looking forword your reply.
runlog.txt
/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1699449181081/work/aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/bin/cryodrgn", line 8, in
sys.exit(main())
File "/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/lib/python3.9/site-packages/cryodrgn/main.py", line 72, in main
args.func(args)
File "/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/lib/python3.9/site-packages/cryodrgn/commands/train_vae.py", line 910, in main
loss, gen_loss, kld = train_batch(
File "/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/lib/python3.9/site-packages/cryodrgn/commands/train_vae.py", line 360, in train_batch
y = preprocess_input(y, lattice, trans)
File "/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/lib/python3.9/site-packages/cryodrgn/commands/train_vae.py", line 405, in preprocess_input
y = lattice.translate_ht(y.view(B, -1), trans.unsqueeze(1)).view(B, D, D)
File "/opt/ohpc/pub/apps/miniconda3/envs/cryodrgn/lib/python3.9/site-packages/cryodrgn/lattice.py", line 163, in translate_ht
return c * img + s * img[:, :, torch.arange(len(coords) - 1, -1, -1)]
RuntimeError: The size of tensor a (16641) must match the size of tensor b (129) at non-singleton dimension 2
Previously, I faced this issue too and it seemed that using a batch size of 28 or 128 solved it for me.
Regards,
Raj
Previously, I faced this issue too and it seemed that using a batch size of 28 or 128 solved it for me.
Regards, Raj
Thanks a lot! It seems work!
We also eventually came across this issue on our end, and after some investigation found that it is caused by dataset.ImageData.__getitem__
returning a two-dimensional image of dimension DxD
rather than a stack of images of dimension 1xDxD
when index
is of length one:
particles = self._process(self.src.images(index).to(self.device))
This can happen when the total image count is modulo one the batch size, and thus the last batch in each training epoch is of size one. Many array operations in torch
and numpy
will drop the singleton dimension automatically!
For now we are patching this by detecting when it happens after the fact and casting the particles to the correct dimension:
if len(particles.shape) == 2:
particles = particles[np.newaxis, ...]
However, as also discussed at #349, the __gettitem__
method needs a rethink, either by more explicitly delineating the actions to take depending on whether the input index is a single element, sequence, slice object etc. (and writing tests for each) or by not supporting sequences as inputs (same as e.g. list
Python objects) and refactoring data loading methods accordingly.
In the meantime, you can access this patch using our beta release channel, which should allow you to use the batch size you originally intended:
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ 'cryodrgn<=3.3.2' --pre
Please let us know if you run into any other issues, and thank you for bringing this to our attention!
-Mike