FirasGit/medicaldiffusion

Warning: data is not aligned! This can lead to a speed loss

WhenMelancholy opened this issue · 3 comments

During the training process, I encountered the following warning outputs:

Sanity Checking DataLoader 0:   0%|                                                                                                     | 0/2 [00:00<?, ?it/s][swscaler @ 0x641c700] Warning: data is not aligned! This can lead to a speed loss
[swscaler @ 0x743a880] Warning: data is not aligned! This can lead to a speed loss
Epoch 0:   0%|                                                                                                                        | 0/565 [00:00<?, ?it/s][swscaler @ 0x59d9700] Warning: data is not aligned! This can lead to a speed loss
[swscaler @ 0x6c7f880] Warning: data is not aligned! This can lead to a speed loss

Although it did not affect the training, I am unclear about the reason behind this. My training instructions are as follows:

CUDA_VISIBLE_DEVICES=2 PL_TORCH_DISTRIBUTED_BACKEND=gloo PYTHONPATH=.:$PYTHONPATH python train/train_vqgan.py dataset=mrnet dataset.root_dir="~/github/medicaldiffusion/data/MRNet-v1.0/" model=vq_gan_3d model.gpus=1 model.default_root_dir="~/github/medicaldiffusion/when/checkpoints/vq_gan" model.default_root_dir_postfix="mrnet" model.precision=16 model.embedding_dim=8 model.n_hiddens=16 model.downsample=[4,4,4] model.num_workers=32 model.gradient_clip_val=1.0 model.lr=3e-4 model.discriminator_iter_start=10000 model.perceptual_weight=4 model.image_gan_weight=1 model.video_gan_weight=1 model.gan_feat_weight=4 model.batch_size=2 model.n_codes=16384 model.accumulate_grad_batches=1 

These instructions are referenced from train_vqgan.sh.

Thank you in advance!

@WhenMelancholy This happened for me aswell, as far as I know this indicates that the number of images in your training data is not evenly divisible by the number of CUDA devices you're training on. This should only have a negligible impact on training as long as you're only training on one server. I believe this is a warning from PyTorch lightning.

@benearnthof This happened for me aswell, could you please tell me how to debug? Is it because the dataset is not divisible by 16?

There is no reason to debug anything as this warning just indicates some minor inefficiencies when scaling images. My prior statement may be incorrect as this most likely stems from one of the image dimensions not being divisible by 16. This should not impact the model however