This is the repo for learning multimodal word embeddings from the CMU-MOSEI dataset -- a dataset of YouTube videos.
To run the code we need the following dependencies:
- pytorch 1.1
- tqdm
- numpy
The processed data can be downloaded here. If you want to check out how the data is created as well as the code for doing so, please see to the dev branch of this repo, there will be additional dependencies as well.
After downloading the data, navigate to the root of this repo and create a directory data
. Then extract the downloaded data to the data
folder. After this the data
directory should look like:
- data
- glove_cache.pt
- data_cache.pt
Once the data is in place, you can run the following command for experiments:
python main.py
There's no additional arguments to be provided -