CrosSCLR

The Official PyTorch implementation of "3D Human Action Representation Learning via Cross-View Consistency Pursuit" in CVPR 2021. The arXiv version of our paper is coming soon.

Requirements

We only test our code on the following environment:

Python == 3.8.2
PyTorch == 1.4.0
CUDA == 11.1

Installation

# Install python environment
$ conda create -n crossclr python=3.8.2
$ conda activate crossclr

# Install PyTorch
$ pip install torch==1.4.0

# Download our code
$ git clone https://github.com/LinguoLi/CrosSCLR.git
$ cd CrosSCLR

# Install torchlight
$ cd torchlight
$ python setup.py install
$ cd ..

# Install other python libraries
$ pip install -r requirements.txt

Data Preparation

We use NTU RGB+D and NTU RGB+D 120 as our datasets.
Please click here for more information about accessing the "NTU RGB+D" and "NTU RGB+D 120" datasets.
Only the 3D skeleton modality is required in our experiments, you could also obtain it via NTURGB-D.

Please put the raw data in the directory <path to nturgbd+d_skeletons> and build the NTU RGB+D database as:

# generate raw database for NTU-RGB+D
$ python tools/ntu_gendata.py --data_path <path to nturgbd+d_skeletons>

# preprocess the above data for our method (for limited computing power, we resize the data to 50 frames)
$ python feeder/preprocess_ntu.py

Unsupervised Pre-Training

Example for unsupervised pre-training of 3s-CrosSCLR. You can train other models by using other .yaml files in config/ folder.

# train on NTU-RGB+D xview
$ python main.py pretrain_crossclr_3views --config config/CrosSCLR/crossclr_3views_xview.yaml

The pre-trained models are in the directory: weights/

Linear Evaluation

Example for linear evaluation of 3s-CrosSCLR. You can evaluate other models by using other .yaml files in config/linear_eval folder.

# evaluate pre-trained model on NTU-RGB+D xview
$ python main.py linear_evaluation --config config/linear_eval/linear_eval_crossclr_3views_xview.yaml --weights <path to weights>
# evaluate the provided pre-trained model
$ python main.py linear_evaluation --config config/linear_eval/linear_eval_crossclr_3views_xview.yaml --weights weights/crossclr_3views_xview_frame50_channel16_cross150_epoch300.pt

Results

The Top-1 accuracy results on two datasets for the linear evaluation of our methods are shown here:

Model	NTU 60 xsub (%)	NTU 60 xview (%)	NTU 120 xsub (%)	NTU 120 xset (%)
SkeletonCLR	68.3	76.4	-	-
2s-CrosSCLR	74.5	82.1	-	-
3s-CrosSCLR	77.8	83.4	67.9	66.7

Visualization

The t-SNE visualization of the embeddings during SkeletonCLR and CrosSCLR pre-training.

Citation

Please cite our paper if you find this repository useful in your resesarch:

@inproceedings{li2021crossclr,
  Title          = {3D Human Action Representation Learning via Cross-View Consistency Pursuit},
  Author         = {Linguo, Li and Minsi, Wang and Bingbing, Ni and Hang, Wang and Jiancheng, Yang and Wenjun, Zhang},
  Booktitle      = {CVPR},
  Year           = {2021}
}

Acknowledgement

The framework of our code is based on the old version of ST-GCN (Its new version is MMSkeleton).
Awesome-Skeleton-based-Action-Recognition
mv-ignet
NTURGB-D

FraLuca/CrosSCLR