agakshat/visualdialog-pytorch

running bugs and what's the v09 folder used for?

wanyao1992 opened this issue · 2 comments

Hi Akshat Agarwal,

Thanks for your sharing of this code. Recently I tried to run your code as an baseline, while I encontered some problems.

  1. When I run the main.py file I got the following issue. Are you sure your code can be run successfully in the environment of Python 3.6 and Pytorch 0.4.0?
/home/wanyao/.conda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
  File "main.py", line 651, in <module>
    optimizerAbotRLarr = []
  File "main.py", line 115, in train

  File "/home/wanyao/.conda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/wanyao/www/Dropbox/ghproj-py36/visualdialog-pytorch-original/networks/encoder_QIH.py", line 33, in forward
    img_emb = F.tanh(self.img_embed(img_raw.view(-1,1,self.img_feat_size)))
RuntimeError: invalid argument 2: size '[-1 x 1 x 4096]' is invalid for input with 7526400 elements at /pytorch/aten/src/TH/THStorage.c:37
  1. My second concern is that what's the v09 folder used for? I know in the repos of visDial.pytorch from Jiasen, only the vdl_img_vgg.h5 is used for extract CNN features for images, why did you use both of the vdl_img_vgg.h5 and data_img_vgg16_pool5.h5 to extract the feature of images? Furthermore, since the file data_img_vgg16_pool5.h5 is too large (50gb), it will take lots of time to load it into memory, and I think it's time consuming to use this file for development/debug. Do you have any suggestion on accelerating this process, or can I just use the vdl_img_vgg.h5 to represent the images?

Hi Wanyao

Thanks for pointing out an error in the Readme - the v09 data file was linked incorrectly. We use the 4096x1 relu7 VGG embeddings instead of the 512x7x7 pool5 VGG embeddings that are present in Jiasen's data, which is why we make that download separately. I've made the correction - the new file is just a 2GB download so hopefully that should alleviate your second concern.

While iterating during our experimentation, we tried out giving both the pool5 and relu7 embeddings to the A-Bot encoder and the Q-Bot loss, which is why the code has remnants of requiring both CNN features for images (you can check utils/dataloader.py for details). I just checked, and as it turns out we are not actually using the image file https://filebox.ece.vt.edu/~jiasenlu/codeRelease/visDial.pytorch/data/vdl_img_vgg.h5 (but the code still expects it, so it might just be easiest to keep it in your data directory).

I've run this code with Python 3.5 and Pytorch 0.4, not with Python 3.6 - but your error message does not seem to be related to that anyway.

closing issue for now, feel free to reopen if you have additional questions.