medhini/audio-video-textures

Installation questions

Opened this issue · 1 comments

Hi Medhini, really excited to try this project out!

I had some installation issues, wondering if you could help me solve them.

First a typo in the README, avgan.yml should be avtexture.yml.

The conda create actually failed on fvcore with some weird error about not finding a version.

ERROR: Could not find a version that satisfies the requirement fvcore==0.1.5 

I was able to copy the pip requirements into a requirements.txt file and remove the ==0.1.5 pin that that solved the problem.

When I tried to train on a video using main.py with the example arguments (I'm assuming the you pass directory where the videos live and then individual video files) I got an error that slowfast wasn't available:

    from .models import (
  File ".../audio-video-textures/contrastive_video_textures/models/models.py", line 18, in <module>
    from slowfast.visualization.predictor import ActionPredictor
ModuleNotFoundError: No module named 'slowfast'

Which I assume refers to https://github.com/facebookresearch/SlowFast ?
Which led to this: https://github.com/facebookresearch/SlowFast/blob/main/INSTALL.md

So I cloned the detectron2 and slow fast repos and installed the detectron repo and exported the slowfast to the pythonpath:

git clone https://github.com/facebookresearch/detectron2
pip install -e detectron2
git clone https://github.com/facebookresearch/slowfast
export PYTHONPATH=path/to/slowfast/:$PYTHONPATH

But then I ran into a new error:

ImportError: cannot import name 'cat_all_gather' from 'pytorchvideo.layers.distributed' (.../site-packages/pytorchvideo/layers/distributed.py)

I was able to resolve that by uninstalling pytorchvideo and installing from the repo:

pip uninstall pytorchvideo
pip install 'git+https://github.com/facebookresearch/pytorchvideo'

Then it would run! It crashed because I think I ran out of memory, what setup were you running this on? How much video ram do you think is required?

May have some other issues with the other commands, but I thought I'd send this in and check with your install and you can update the install instructions for others.

Thanks!

OK, made a much smaller video and didn't end up with a killed process. What size and length do you recommend?

I got to a point:

Starting video <my_video_name>
Frame shape:  torch.Size([180, 320, 3])
Stride 6 Window 15
=> creating model '1'
Traceback (most recent call last):
  ...
  File ".../contrastive_video_textures/models/models.py", line 570, in build_network
    cfg = load_config(args, path_to_config=args.cfg_file)
  ...
FileNotFoundError: [Errno 2] No such file or directory: '/home/medhini/audio_video_gan/contrastive_video_textures/slowfast_configs/SLOWFAST_8X8_R50.yaml'

So there are two places in model.py where assumptions about paths are made. I edited those and got a bit further, but ran into:

FileNotFoundError: [Errno 2] No such file or directory: 'pytorch_vggish.pth'

Not sure where this is from, but a google search found this unmaintained repo:
https://github.com/harritaylor/torchvggish/releases

So grabbed that and fixed the path, but ten ran into state_dict problems:

RuntimeError: Error(s) in loading state_dict for VGGish:
	Missing key(s) in state_dict: "fc.0.weight", "fc.0.bias", "fc.2.weight", "fc.2.bias", "fc.4.weight", "fc.4.bias". 
	Unexpected key(s) in state_dict: "embeddings.0.weight", "embeddings.0.bias", "embeddings.2.weight", "embeddings.2.bias", "embeddings.4.weight", "embeddings.4.bias". 

So I guess I got the wrong one? Another search took me to https://zenodo.org/record/3839226 which seemed pretty sketchy, but that one did work!

But slowfast errored out with tensor dimensionality issue:

Traceback (most recent call last):
  File "main.py", line 548, in <module>
    main(args, video_name, itr)
  File "main.py", line 452, in main
    loss = train(
  File "/audio-video-textures/contrastive_video_textures/train.py", line 114, in train
    output = model(
  ...
  File "/audio-video-textures/contrastive_video_textures/models/models.py", line 337, in forward
    q_f = self.q_encoder(q_f)
  ...
  File ".../slowfast/slowfast/models/video_model_builder.py", line 428, in forward
    x = self.s1_fuse(x)
  File "/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../slowfast/slowfast/models/video_model_builder.py", line 168, in forward
    x_s_fuse = torch.cat([x_s, fuse], 1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.

Ideas?