This code is a re-implementation of the video classification experiments in our Revisiting Hard-example for Action Recognition. The code is developed based on the PyTorch framework.
Congrulation! Our paper has been accepted by TCSVT. The final version code and pretrained model will be upload soon!
- Pytorch0.4+
- PilImage
- Lintel(optional, for on-the-fly decode)
As shown in data/*/*.txt
, the video should in format [path, frame_num, class]
/home/wjp/Data/hmdb51/AmericanGangster_drink_u_nm_np1_fr_med_39 77 10
DataSet | Modality | accuarcy |
---|---|---|
HMDB51 | RGB + Flow | 83.0% |
UCF101 | RGB + Flow | 98.4% |
Something-something-v1 | RGB | 48.4% |
Kinetics-400 | RGB | 76.7% |
Use offered pretrained-model, all these result can be reproduced in 1 1080Ti GPU.
bash scripts/hmdb51/run_hmdb51_mpi3d_rgb.sh
bash scripts/hmdb51/run_hmdb51_mpi3d_flow.sh
bash scripts/ucf101/run_ucf101_i3d_rgb.sh
bash scripts/ucf101/run_ucf101_i3d_flow.sh
bash scripts/kinetics/run_kinetics_i3d_rgb_on_the_fly.sh
bash scripts/something_something_v1/train_i3d_rgb.sh
bash scripts/something_something_v1/train_mpi3d_pt_flow.sh
bash scripts/charades/run_charades_i3d_rgb.sh
The validation is one clip accuracy. After train, the model will be saved in checkpoints/[]/[].pth.tar. For final prediction, refer to test section. Test use the average from 10 clips (3 crops in spatial dimension). Lead to 1%-2% improvement in UCF101/HMDB51. 3%-5% in Something-something-v1. 7%-12% in Kinetics.
bash scripts/hmdb51/test_hmdb51_mpi3d_rgb[flow].sh
bash scripts/ucf101/test_ucf101_i3d_rgb.sh
bash scripts/kinetics/test_kinetics_mpi3d_rgb.sh
bash scripts/something_something_v1/test_mpi3d_rgb_video.sh
The score will be save as .npy.
python eval_all_network.py RGB_SCORE_FILE FLOW_SCORE_FILE --score_weights 1 1.5
For TSM/Slow-Fast implement. Just combine SL and VTC with these models.
This project is partly based on TSN . Also thanks pytorch-i3d for offer pytorch-i3d version.