- youtube-dl (https://github.com/rg3/youtube-dl/)
To download all the Sports1M videos run the following command line:
mkdir $VIDEO_PATH
chmod +x fetch_sports1m_videos.sh
./fetch_sports1m_videos.sh $VIDEO_PATH all_vid.txt NUM_WORKERS
Where $VIDEO_PATH is the path where the videos will be located. If you already have a subset of the videos, input that directory. NUM_WORKERS is the number of workers to download the dataset concurrently.
You should download JSON annotation zip from project webpage. Extracting the zip, we have two JSON files: sports1m_train.json and sports1m_test.json. Running the following Python script to create all_vid.txt contains Youtube id of all videos of Sports1M:
python generate_all_videos_txt_file.py
We have filtered out some videos having excessive duration accoring to recommendations from the [project page] (http://cs.stanford.edu/people/karpathy/deepvideo/) of the dataset by setting a DURATION THRESHOLD. You can use the following script to visualize histogram of video durations in Sports1M dataset.
python histogram_durations.py