This is extended work from fork of https://github.com/joaanna/something_else This repo adds a Soft Attention model, which improves the accuracy of the original STIN model. Other miscellaneous tools include the visualization codes that generates: 1/ confused heatmap and 2/ annotation of bounding box and prediction/ground truth on video frames, with attention weight labeled.
Download videos from the dataset provider:
https://20bn.com/datasets/something-something
Download and unzip all videos to /something_videos
under the root directory of the project. (The folder does not come with this repo, create them by yourself.)
https://drive.google.com/drive/folders/1XqZC2jIHqrLPugPOVJxCH_YWa275PBrZ
Download 4 JSONs files from the google drive, and put them under /bounding_box_annotations
folder (The folder does not come with this repo, create them by yourself.)
- Install
ffmpeg
by runningsudo apt-get install ffmpeg
in terminal (resource: https://trac.ffmpeg.org/wiki/CompilationGuide/Ubuntu) - Run
batch_fps_conversion.sh
script to extract all videos (under/something_videos
) into frames extracted by fps 12 (frame per second rate). - See the result in
/something_videos_frames
. Every video frames are extracted into the folder, named after each video's basename.
Note: For non-Linux user, if you bump into bash error like "syntax error at \r", it is likely a Windows line break incompatability. It happens when you use text editor to edit a bash file, it would change the invisible line break symbol in the file. Solve this issue by running dos2unix
on the bash file you edit. To install it, run sudo apt-get install dos2unix
. To use it, simply dos2unix batch_fps_conversion
.