This project is to show which space-time region that the model focus on, supported supervised or unsupervised (no label available). For an input video, this project will show attention map in video and frames.
Video can't be show here, there are some gif.
supervised with label
unsupervised (only have RGB video)
heatmap
focus map
In some case, the real label of video/action can't access. We average all filters and visualize the heatmap.
- pytorch0.4
- opencv
- numpy
- skvideo
- ffmpeg
git clone https://github.com/FingerRec/3DNet_Visualization.git
cd 3DNet_Visualization
mkdir pretrained_model
download pretrained MFNet on UCF101 from google_drive and put it into directory pretrained_model, which is from MFNet
R3D pretrain model is from 3D-Resnet-Pytorch
C3D pretrain model is from C3D-Pytorch
pretrained I3d on HMDB51
bash scripts/demo.sh
bash scripts/c3d_unsupervised_demo.sh
bash scripts/r3d_unsupervised_demo.sh
The generate video and imgs will be put in dir output/imgs and output/video.
Tip: in main.py, if set clip_steps is 1, will generate a video the same length as origin.
the details in demo.sh as follow, change --video and --label accorading to your video, please refer to resources/classInd.txt for label information for UCF101 videos.
python main.py --num_classes 101 \
--classes_list resources/classInd.txt \
--model_weights pretrained_model/MFNet3D_UCF-101_Split-1_96.3.pth \
--video test_videos/[your own video here] \
--frames_num 16 --label 0 --clip_steps 16 \
--output_dir output \
--supervised unsupervised # not annotate this line if no label available
Notice unsupervised compute only add --supervised unsupervised in script;
Tip:UCF101/HMDB51 dataset is support now, for Kinetics et al. Just download a pretrained model and change --classes_list
- support i3d, mpi3d
- support multi fc layers or full convolution networks
- support feature map average without label
- support r3d and c3d
- support Slow-Fast Net
- visualize filters
- grad-cam
Support your own network:
- pretrained model; 2. update load_model() in main.py; 3. modify last linear layer name in generate_supervised_cam in action_recognition.py
Notice C3D and R3D are pretrained on Sports/Kinetics, for better visualization, you may need to finetune these networks on UCF/HMDB as in RHE
This project is highly based on SaliencyTubes , MF-Net and st-gcn.