microsoft/VideoX

hdf5 broken for TACoS?

iriyagupta opened this issue · 10 comments

Hi,

on running the eval for TACoS I get the following error :
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open KeyError: "Unable to open object (object 's30-d52.avi' doesn't exist)"

I am unsure if this is broken or something, can you please help

@Sy-Zhang pls have a check.

Hi,

on running the eval for TACoS I get the following error : File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open KeyError: "Unable to open object (object 's30-d52.avi' doesn't exist)"

I am unsure if this is broken or something, can you please help

Which hdf5 file are you using? and which cloud drive did you download from?

Hi @Sy-Zhang I used tall_c3d_features from this link https://rochester.app.box.com/s/8znalh6y5e82oml2lr7to8s6ntab6mav/folder/137471786054

I tried and didn't get this error. Could you check whether your hdf5 file is broken?

that is weird, so i used the following steps, changed the name to the file name I downloaded from already trained model in the tacos data yml file like ./checkpoints/TACoS/pretrained_pkl_file and ran moment_localization/test.py. I hope that is the correct method.
I kept the .hdf5 feature file in the ./data/TACoS/ folder after downloading from this link. There was merge_npys_to_hdf5.py as well in that folder but it also throws error on running, but I think that is not supposed to be used anyway?

The other thing is I am using nn.DataParallel, do you think that could be an error? @Sy-Zhang

any help would be appreciated

that is weird, so i used the following steps, changed the name to the file name I downloaded from already trained model in the tacos data yml file like ./checkpoints/TACoS/pretrained_pkl_file and ran moment_localization/test.py. I hope that is the correct method. I kept the .hdf5 feature file in the ./data/TACoS/ folder after downloading from this link. There was merge_npys_to_hdf5.py as well in that folder but it also throws error on running, but I think that is not supposed to be used anyway?

The other thing is I am using nn.DataParallel, do you think that could be an error? @Sy-Zhang

any help would be appreciated

image
Could you try the code shown in this figure to check whether your hdf5 file has 's30-d52.avi'?

Thank you, I checked it exists, I redownloaded the data and ran it, and it seems to load correctly now, however, on using 4 GPUs even just for evaluation it shows

RuntimeError: CUDA out of memory. Tried to allocate 308.00 MiB (GPU 3; 10.92 GiB total capacity; 4.16 GiB already allocated; 72.38 MiB free; 4.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Thank you, I checked it exists, I redownloaded the data and ran it, and it seems to load correctly now, however, on using 4 GPUs even just for evaluation it shows

RuntimeError: CUDA out of memory. Tried to allocate 308.00 MiB (GPU 3; 10.92 GiB total capacity; 4.16 GiB already allocated; 72.38 MiB free; 4.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Then you need reduce the batch size or use GPU with larger GPU memory.

Makes sense, that would be changed in the .yaml file as per my understanding. I will run it again and check.

I guess it was all some broken file issue and some lack of memory from my end. Thank you for your help :)