implementation of "Action Quality Assessment with Temporal Parsing Transformer"
- pretrain model on MTL-AQA
- test script
- Python >= 3.8
- Pytorch >= 1.7.1
- cuda 10.1
- Please download the dataset from the [link], and unzip the file. (the processed data is based on this repo)
- put the folder named "data_preprocessed" in ./data
- put the file named "data_preprocessed/model_rgb.pth" in ./data
The data structure should be:
$DATASET_ROOT
├── model_rgb.pth
├── data_preprocessed/
├── MTL_AQA/
├── frames_long
├── 01_01/
├── 00017977.jpg
...
...
└── 07_25/
├── 00040170.jpg
...
├── info
├── final_annotations_dict_with_dive_number
├── test_split_0.pkl
└── train_split_0.pkl
├── AQA_7/
├── frames
├── diving-out
├── 001
├── img_00001.jpg
...
...
├── info
├── split_4_test_list.mat
├── split_4_train_list.mat
python -u -m torch.distributed.launch --nproc_per_node=8 train_pairencode1_decoder_1selfatt_self8head_ffn_sp_new.py --epoch_num=250 --dataset=MLT_AQA --bs_train=3 --bs_test=3 --use_pretrain=False --num_cluster=5 --margin_factor=3.2 --encode_video=False --hinge_loss=True --multi_hinge=True --d_model=512 --d_ffn=512 --exp_name=sp_new_103_5_3
Our code is based on CoRe. Thanks for their great work!