This repo is the official implementation for Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning. The paper has been accepted by IEEE Transactions of Multimedia.
Please cite this work if you find it useful:
@article{chen2024vision,
title={Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning},
author={Chen, Yang and He, Tian and Fu, Junfeng and Wang, Ling and Guo, Jingcai and Cheng, Hong},
journal={arXiv preprint arXiv:2405.20606},
year={2024}
}