hdf5 broken for TACoS?

Question

hdf5 broken for TACoS?

iriyagupta opened this issue 2 years ago · 10 comments

Hi,

on running the eval for TACoS I get the following error :
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open KeyError: "Unable to open object (object 's30-d52.avi' doesn't exist)"

I am unsure if this is broken or something, can you please help

Answer 1 · 2022-10-26T07:55:24.000Z

@Sy-Zhang pls have a check.

Answer 2 · 2022-10-26T10:47:08.000Z

Hi,

on running the eval for TACoS I get the following error : File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open KeyError: "Unable to open object (object 's30-d52.avi' doesn't exist)"

I am unsure if this is broken or something, can you please help

Which hdf5 file are you using? and which cloud drive did you download from?

Answer 3 · 2022-10-26T17:34:25.000Z

Hi @Sy-Zhang I used tall_c3d_features from this link https://rochester.app.box.com/s/8znalh6y5e82oml2lr7to8s6ntab6mav/folder/137471786054

Answer 4 · 2022-10-26T18:14:10.000Z

Hi @Sy-Zhang I used tall_c3d_features from this link https://rochester.app.box.com/s/8znalh6y5e82oml2lr7to8s6ntab6mav/folder/137471786054

I tried and didn't get this error. Could you check whether your hdf5 file is broken?

Answer 5 · 2022-10-26T18:20:27.000Z

that is weird, so i used the following steps, changed the name to the file name I downloaded from already trained model in the tacos data yml file like ./checkpoints/TACoS/pretrained_pkl_file and ran moment_localization/test.py. I hope that is the correct method.
I kept the .hdf5 feature file in the ./data/TACoS/ folder after downloading from this link. There was merge_npys_to_hdf5.py as well in that folder but it also throws error on running, but I think that is not supposed to be used anyway?

The other thing is I am using nn.DataParallel, do you think that could be an error? @Sy-Zhang

any help would be appreciated

Answer 6 · 2022-10-26T18:26:47.000Z

that is weird, so i used the following steps, changed the name to the file name I downloaded from already trained model in the tacos data yml file like ./checkpoints/TACoS/pretrained_pkl_file and ran moment_localization/test.py. I hope that is the correct method. I kept the .hdf5 feature file in the ./data/TACoS/ folder after downloading from this link. There was merge_npys_to_hdf5.py as well in that folder but it also throws error on running, but I think that is not supposed to be used anyway?

The other thing is I am using nn.DataParallel, do you think that could be an error? @Sy-Zhang

any help would be appreciated

Could you try the code shown in this figure to check whether your hdf5 file has 's30-d52.avi'?

Answer 7 · 2022-10-26T18:36:02.000Z

Thank you, I checked it exists, I redownloaded the data and ran it, and it seems to load correctly now, however, on using 4 GPUs even just for evaluation it shows

RuntimeError: CUDA out of memory. Tried to allocate 308.00 MiB (GPU 3; 10.92 GiB total capacity; 4.16 GiB already allocated; 72.38 MiB free; 4.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Answer 8 · 2022-10-26T18:38:18.000Z

Thank you, I checked it exists, I redownloaded the data and ran it, and it seems to load correctly now, however, on using 4 GPUs even just for evaluation it shows

RuntimeError: CUDA out of memory. Tried to allocate 308.00 MiB (GPU 3; 10.92 GiB total capacity; 4.16 GiB already allocated; 72.38 MiB free; 4.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Then you need reduce the batch size or use GPU with larger GPU memory.

Answer 9 · 2022-10-26T18:40:09.000Z

Makes sense, that would be changed in the .yaml file as per my understanding. I will run it again and check.

Answer 10 · 2022-10-26T23:34:03.000Z

I guess it was all some broken file issue and some lack of memory from my end. Thank you for your help :)