This is an implementaion of our method about ChaLearn LAP Large-scale Continuous Gesture Recognition Challenge (Round 2) @ICCV 2017
This code was tested on Windows10 OS with VS2012 and Ubuntu 14.04 OS with Python 2.7, caffe-C3D, sklearn, faster-rcnn. Please double check the paths in code before you run it.
(1): set the data_path in convertConGTrain2IsoGTrain_windows_matlab/ConvertConVedioToIsoGesture.m as your own path
(2): run convertConGTrain2IsoGTrain_windows_matlab/ConvertConVedioToIsoGesture.m
Then there will be a new file named convertContinousToIsoGestrueTrain, which contains all converted training dataset.
Face and hand postion can be detected by using the codes in the detection folder. To use these codes, you can refer to Detection/Detection-Pipeline-Cons.pdf.
Step3: Preprocess the training dataset and get the input data and setting files needed by Step4(for c3d finetune)
Dependencies: cv2, numpy, ffmpeg.
All input data for step2 are lied in python/data. The files in python/data should look like as follows:
You should place all the 249 converted RGB and aligned depth subfolders(001,002,...) of training set in ConGD_phase_1_RGB/convertContinousToIsoGestrueTrain_new/ and ConGD_phase_1_aligned_depth/convertContinousToIsoGestrueTrain_new/ respectively. Similarly, place all the 84 segmented RGB, aligned depth and handDetection subfolders(001,002,...) of validation set in seg_valid_depth_2stream/, seg_valid_rgb_2stream/ and seg_valid_rgb_2stream/cvtConGTestToIso/. The validation temporal segmentation file are placed in seg_valid_rgb_2stream/ConGTestSegInfo/. (The temporal segmentation will be introduced in Testing)
In this github, there is only a subfolder(001). We take it as an example to show you how to get the input data and setting files needed by Setp4 quickly. You can run the following commands to get all the files:
sh #(for getting the 'RGB files' needed by finetuning c3d of traning set)
sh #(for getting the 'depth files' needed by finetuning c3d of traning set)
After the two commands, the generated input data for c3d will be saved in python/data/train/unifi_only_hand_file/depth and python/data/train/only_hand_face_file/rgb. The setting files will be saved in python/train_list_c3d. Note, we finetune the c3d with RGB and depth respectively. For more about the input data and the setting files, you can refer to C3D User Guide(
he two scripts includes several steps as follows:
The pre-trained model can be downloaded from
Final RGB images only have face and hands. Final depth images only have hands. The length of C3D input videos are all 32 frames by sampling. (All functions have detail comments in python/
The training process stops after 100000 iterations. It will cost about 60 hours in a single Titan X GPU for finetuning the C3D.
For more information about the installation and usage, please refer ro After Installation of C3D, place C3D-v1.0-key-parts/con_c3d_finetuning_len_32_only_hand_depth_map_2streams, con_c3d_finetuning_len_32_only_hand_face_rgb_2streams, c3d_train_ucf101 and con_validation_only_hand_depth_map_2stream_feature_extraction and con_validation_only_hand_face_rgb_2stream_feature_extraction in C3D-v1.0/example.
(The finetuned c3d model can be downloaded from
The c3d extracted feature will be saved in python/feature. (The extracted c3d feature can be downloaded from
The detail introduction of Step4 and Step5 can be refered to C3D User Guide. You can also refer to our prototxt files inC3D-v1.0-key-parts.
Run the matlab script read_and_fuse_feature_ubuntu_matlab/main.m. The final fused training features lie in the current directory. (the final feature can also be downloaded from
python --isTrain 1 --isTest 0
We have already uploaded the trained svm model in svm_model/.
Step2: Run the code in validation_Tesing_temporal_segmentaion to get segmented video, hand detection and video length. (described in Step3 of Training)
The files in validation_Tesing_temporal_segmentaion/data/ shoule look like as follows:
The ConGD_Phase_1_aligned file contains aligned depth video. The ConGD_phase_1 file contains original RGB and depth video. The cong file contains correspoing hand detection results got from training Step2. All generated temporal segmentation files are placed in validation_Tesing_temporal_segmentaion/output/. Place video length file in python/. Organize other segmented files as the way of training step2.
Step3: To preprocess the training dataset and get the input data and setting files for extacting c3d fc6 feature:
Step5: Read and fuse extracted RGB and depth feature by running read_and_fuse_feature_ubuntu_matlab/mainTest.m
Step6: To test the validation or testing dataset in and get prediction file in python/submission:
python --isTrain 0 --isTest 1
Accuracy on validation: 0.514451