tohinz/multiple-objects-gan

How to solve dataset AssertionError?

b2220333 opened this issue · 5 comments

After I decide to use python2 to run, I use pycharm to create a new python2 virtual environment and run:

git clone https://github.com/tohinz/multiple-objects-gan
cd multiple-objects-gan
vim requirements.txt to del pkg-resources==0.0.0 to prevent errors.
pip install -r requirements.txt
cd models/
wget -c https://www2.informatik.uni-hamburg.de/wtm/software/multiple-objects-gan/model-ms-coco-attngan.zip
unzip model-ms-coco-attngan.zip
cd ../code/coco/attngan/
edit coco_eval.yml change to 
DATA_DIR: '/home/sam/code/python/pytorch/image_caption/dataset/coco2014'
IMG_DIR: "/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014"
mkdir -p DAMSMencoders/coco/
wget -c https://www.dropbox.com/s/zj3z0lvkfd8vaga/image_encoder100.pth?dl=0 -O DAMSMencoders/coco/image_encoder100.pth
wget -c https://www.dropbox.com/s/jo325z064a7x07k/text_encoder100.pth?dl=0 -O DAMSMencoders/coco/text_encoder100.pth
python2 main.py --cfg cfg/coco_eval.yml

After I run above instructions I got errors:

(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ python2 main.py --cfg cfg/coco_eval.yml
Using config:
{'B_VALIDATION': True,
 'CONFIG_NAME': 'attn2',
 'CUDA': True,
 'DATASET_NAME': 'coco',
 'DATA_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014',
 'GAN': {'B_ATTENTION': True,
         'B_DCGAN': False,
         'CONDITION_DIM': 100,
         'DF_DIM': 96,
         'GF_DIM': 48,
         'R_NUM': 3,
         'Z_DIM': 100},
 'GPU_ID': '0',
 'IMG_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014',
 'RNN_TYPE': 'LSTM',
 'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 20},
 'TRAIN': {'BATCH_SIZE': 50,
           'B_NET_D': False,
           'DISCRIMINATOR_LR': 0.0002,
           'ENCODER_LR': 0.0002,
           'FLAG': False,
           'GENERATOR_LR': 0.0002,
           'MAX_EPOCH': 600,
           'NET_E': 'DAMSMencoders/coco/text_encoder100.pth',
           'NET_G': '../../../models/model-ms-coco-attngan-0100.pth',
           'RNN_GRAD_CLIP': 0.25,
           'SMOOTH': {'GAMMA1': 5.0,
                      'GAMMA2': 5.0,
                      'GAMMA3': 10.0,
                      'LAMBDA': 1.0},
           'SNAPSHOT_INTERVAL': 2000},
 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3},
 'WORKERS': 1}
bboxes:  (40470, 3, 4)
labels:  (40470, 3, 1)
Save to:  /home/sam/code/python/pytorch/image_caption/dataset/coco2014/captions.pickle
Traceback (most recent call last):
  File "main.py", line 134, in <module>
    assert dataset
AssertionError
(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$

If I comment line 134 of main.py in code/coco/attngan directory and run the same instruction again, it shows:

(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ python2 main.py --cfg cfg/coco_eval.yml
Using config:
{'B_VALIDATION': True,
 'CONFIG_NAME': 'attn2',
 'CUDA': True,
 'DATASET_NAME': 'coco',
 'DATA_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014',
 'GAN': {'B_ATTENTION': True,
         'B_DCGAN': False,
         'CONDITION_DIM': 100,
         'DF_DIM': 96,
         'GF_DIM': 48,
         'R_NUM': 3,
         'Z_DIM': 100},
 'GPU_ID': '0',
 'IMG_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014',
 'RNN_TYPE': 'LSTM',
 'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 20},
 'TRAIN': {'BATCH_SIZE': 50,
           'B_NET_D': False,
           'DISCRIMINATOR_LR': 0.0002,
           'ENCODER_LR': 0.0002,
           'FLAG': False,
           'GENERATOR_LR': 0.0002,
           'MAX_EPOCH': 600,
           'NET_E': 'DAMSMencoders/coco/text_encoder100.pth',
           'NET_G': '../../../models/model-ms-coco-attngan-0100.pth',
           'RNN_GRAD_CLIP': 0.25,
           'SMOOTH': {'GAMMA1': 5.0,
                      'GAMMA2': 5.0,
                      'GAMMA3': 10.0,
                      'LAMBDA': 1.0},
           'SNAPSHOT_INTERVAL': 2000},
 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3},
 'WORKERS': 1}
bboxes:  (40470, 3, 4)
labels:  (40470, 3, 1)
Load from:  /home/sam/code/python/pytorch/image_caption/dataset/coco2014/captions.pickle
/home/sam/anaconda3/envs/py2_t1/lib/python2.7/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
  File "main.py", line 158, in <module>
    algo.sample(split_dir, num_samples=25, draw_bbox=True)
  File "/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan/trainer.py", line 489, in sample
    text_encoder.load_state_dict(state_dict)
  File "/home/sam/anaconda3/envs/py2_t1/lib/python2.7/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RNN_ENCODER:
        size mismatch for encoder.weight: copying a param of torch.Size([1, 300]) from checkpoint, where the shape is torch.Size([27297, 300]) in current model.
(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ 

How could I solve these problems?
Thank you~

Hi, the first error means something is wrong with the created dataset. Check out the init() method of the TextDataset class in code/coco/attngan/datasets.py and make sure it loads everything correctly (especially the captions file/pre-processed metadata downloaded from the original AttnGAN Github).

For the second error I guess your path to the state dict of NET_E might be wrong, I suggest setting an absolute path to check if that solves the issue.

I spend some time to check as you hints.
I found that it is really a loading dataset problem.
However, I coundn't find out what's going on.
After I setting NET_E and NET_G directory to be absolute path, the problem stills...

Here is the running outputs:

(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ python2 main.py --cfg cfg/coco_eval.yml
Using config:
{'B_VALIDATION': True,
 'CONFIG_NAME': 'attn2',
 'CUDA': True,
 'DATASET_NAME': 'coco',
 'DATA_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014',
 'GAN': {'B_ATTENTION': True,
         'B_DCGAN': False,
         'CONDITION_DIM': 100,
         'DF_DIM': 96,
         'GF_DIM': 48,
         'R_NUM': 3,
         'Z_DIM': 100},
 'GPU_ID': '0',
 'IMG_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014',
 'RNN_TYPE': 'LSTM',
 'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 20},
 'TRAIN': {'BATCH_SIZE': 50,
           'B_NET_D': False,
           'DISCRIMINATOR_LR': 0.0002,
           'ENCODER_LR': 0.0002,
           'FLAG': False,
           'GENERATOR_LR': 0.0002,
           'MAX_EPOCH': 600,
           'NET_E': '/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan/DAMSMencoders/coco/text_encoder100.pth',
           'NET_G': '/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/models/model-ms-coco-attngan-0100.pth',
           'RNN_GRAD_CLIP': 0.25,
           'SMOOTH': {'GAMMA1': 5.0,
                      'GAMMA2': 5.0,
                      'GAMMA3': 10.0,
                      'LAMBDA': 1.0},
           'SNAPSHOT_INTERVAL': 2000},
 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3},
 'WORKERS': 1}
bboxes:  (40470, 3, 4)
labels:  (40470, 3, 1)
Load from:  /home/sam/code/python/pytorch/image_caption/dataset/coco2014/captions.pickle
Traceback (most recent call last):
  File "main.py", line 138, in <module>
    assert dataset
AssertionError
(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$
(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ python2 main.py --cfg cfg/coco_eval.yml
Using config:
{'B_VALIDATION': True,
 'CONFIG_NAME': 'attn2',
 'CUDA': True,
 'DATASET_NAME': 'coco',
 'DATA_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014',
 'GAN': {'B_ATTENTION': True,
         'B_DCGAN': False,
         'CONDITION_DIM': 100,
         'DF_DIM': 96,
         'GF_DIM': 48,
         'R_NUM': 3,
         'Z_DIM': 100},
 'GPU_ID': '0',
 'IMG_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014',
 'RNN_TYPE': 'LSTM',
 'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 20},
 'TRAIN': {'BATCH_SIZE': 50,
           'B_NET_D': False,
           'DISCRIMINATOR_LR': 0.0002,
           'ENCODER_LR': 0.0002,
           'FLAG': False,
           'GENERATOR_LR': 0.0002,
           'MAX_EPOCH': 600,
           'NET_E': '/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan/DAMSMencoders/coco/text_encoder100.pth',
           'NET_G': '/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/models/model-ms-coco-attngan-0100.pth',
           'RNN_GRAD_CLIP': 0.25,
           'SMOOTH': {'GAMMA1': 5.0,
                      'GAMMA2': 5.0,
                      'GAMMA3': 10.0,
                      'LAMBDA': 1.0},
           'SNAPSHOT_INTERVAL': 2000},
 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3},
 'WORKERS': 1}
bboxes:  (40470, 3, 4)
labels:  (40470, 3, 1)
Load from:  /home/sam/code/python/pytorch/image_caption/dataset/coco2014/captions.pickle
/home/sam/anaconda3/envs/py2_t1/lib/python2.7/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
  File "main.py", line 162, in <module>
    algo.sample(split_dir, num_samples=25, draw_bbox=True)
  File "/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan/trainer.py", line 489, in sample
    text_encoder.load_state_dict(state_dict)
  File "/home/sam/anaconda3/envs/py2_t1/lib/python2.7/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RNN_ENCODER:
        size mismatch for encoder.weight: copying a param of torch.Size([1, 300]) from checkpoint, where the shape is torch.Size([27297, 300]) in current model.
(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$

What can I do next?
Thank you~~

Hi, I think the two errors are unrelated.
The first problem is with the dataset, something in its construction does not seem to work correctly. When initializing the dataset, can you check that lines 159-169 work correctly, i.e. check their shape and/or content to make sure they are all loaded correctly? E.g. self.number_example should be 40470 for the validation set.

For the second error it looks like something might be wrong with the pretrained text encoder. Try downloading it again to make sure the file is not corrupted. Also, which PyTorch version are you using? I think they changed the state_dict loading in one of the previous versions, so that might be an issue here, too.

Hi, I encounter the same problem as above and I have already tried re-downloading the DAMSM text encoder. I am using python 2.7.12 and pytorch 0.4.1 in a docker container.

Here is my running outputs

Starting training on the MS-COCO data set.
Using config:
{'B_VALIDATION': False,
'CONFIG_NAME': 'glu-gan2',
'CUDA': True,
'DATASET_NAME': 'coco',
'DATA_DIR': '/workspace/data/MS-COCO',
'GAN': {'B_ATTENTION': True,
'B_DCGAN': False,
'CONDITION_DIM': 100,
'DF_DIM': 96,
'GF_DIM': 48,
'R_NUM': 3,
'Z_DIM': 100},
'GPU_ID': '0,1,2',
'IMG_DIR': '/workspace/data/MS-COCO/train/train2014',
'RNN_TYPE': 'LSTM',
'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 12},
'TRAIN': {'BATCH_SIZE': 14,
'B_NET_D': True,
'DISCRIMINATOR_LR': 0.0002,
'ENCODER_LR': 0.0002,
'FLAG': True,
'GENERATOR_LR': 0.0002,
'MAX_EPOCH': 120,
'NET_E': 'DAMSMencoders/coco/text_encoder100.pth',
'NET_G': '',
'RNN_GRAD_CLIP': 0.25,
'SMOOTH': {'GAMMA1': 4.0,
'GAMMA2': 5.0,
'GAMMA3': 10.0,
'LAMBDA': 50.0},
'SNAPSHOT_INTERVAL': 5},
'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3},
'WORKERS': 20}
bboxes: (82783, 3, 4)
labels: (82783, 3, 1)
Load filenames from: /workspace/data/MS-COCO/train/filenames.pickle (82783)
Load from: /workspace/data/MS-COCO/captions.pickle
num_exp:82783
('Load pretrained model from ', 'https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth')
473
Load image encoder from: DAMSMencoders/coco/image_encoder100.pth
/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
9
Traceback (most recent call last):
File "main.py", line 152, in
algo.train()
File "/workspace/code/coco/attngan/trainer.py", line 252, in train
text_encoder, image_encoder, netG, netsD, start_epoch = self.build_models()
File "/workspace/code/coco/attngan/trainer.py", line 76, in build_models
text_encoder.load_state_dict(state_dict)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 719, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RNN_ENCODER:
size mismatch for encoder.weight: copying a param of torch.Size([1, 300]) from checkpoint, where the shape is torch.Size([27297, 300]) in current model.

num_exp is the self.number_example
473 & 9 is the length of state_dict of image encoder and text_encoder respectively.

May I know how to solve this error?
Thank you.

Hi, to me this looks like a problem with the state_dict for the text encoder. The text encoder has an embedding layer of shape [27297, 300] (27297 words, each with a 300-dim embedding). It seems that the state_dict only has an embedding of size [1, 300]. Length 9 for the text_encoder state_dict is correct, the entries should be: ['encoder.weight', 'rnn.weight_ih_l0', 'rnn.weight_hh_l0', 'rnn.bias_ih_l0', 'rnn.bias_hh_l0', 'rnn.weight_ih_l0_reverse', 'rnn.weight_hh_l0_reverse', 'rnn.bias_ih_l0_reverse', 'rnn.bias_hh_l0_reverse']

Could you please check that

  • self.n_words = 27297 (seems to be the case for you) and
  • state_dict["encoder.weight"].shape = (27297, 300) (this is the pre-trained model from AttnGAN).

The pre-trained AttnGAN text encoder model (text_encoder100.pth) should be about 33MB in size, the image encoder (image_encoder100.pth) about 86MB.