MultiModal-Transformers-for-Nurse-Activity-Recognition [CVPR2022]

This repo is for official implementation of the paper "Multimodal transformer for Nurse Activity Recognition", published in the Fifth International Workshop on Computer Vision for Physiological Measurement (CVPM), in conjunction with CVPR 2022.

Introduction

This paper proposes a novel transformer based real world action recognition method. The proposed method involves two single modality transformer models, for performing action recogniton on Nurse-Activity-Recogntion-dataset(2019). First single moadlity transformer extract sptio-temporal features from skeletal joints of data the subjects and tries to recognize nurse activities from just single modality data. Second single modality transformer performs action recogniton by modeling correlation between acceleration of the performer. Both models are shwon as follows.


Single Modality Transformers (a) Skeletal Joints Model (b) Acceleraion Model

We propose a multi-modal transformer by combining both skeletal joints and acceleration data models' final cls tokens and also introuce an additional cross view fusion between both model's layer to develop stronger and better feature vectors for final action recognition. In fusion layer, the spatio-temporal skeletal joints tokens attend to the self-encoded acceleration tokens, which is repeated in all layers. Our result deonstrate the fusing acceleration and skeletal joints gives better action recogniton performance as compare to single modality transformers and simple fusion of both models wiithout cross view fusion.


Cross View Fusion Model (a) Cross View Fusion (b) MultiModal Transformer with CrossView Fusion

Results and Checkpoints

Model	Accuracy	F1-score	Precision	Recall	CheckPoint
Skeleton Model	76.7	67.0	69.1	70.5	SkeletonModel.pth
Acceleration Model	45.6	10.9	9.3	14.9	AccModel.pth
Simple Fusion	75.0	71.6	75.6	72.3	SimpleFusion.pth
Cross View Fusion Model	81.8	78.4.	79.4	78.3	CrossViewFusion.pth

Comparison with state-of-the-art

We compare our methods with all other existing solutions reported on the NCRC dataset, including the hand-crafted-feature-based KNN winning entry. NCRC dataset offers three different sensors data during course of performing action, including

Motion Capture - 29 Skeletal joints data of nurse
Acceleration - Acceleration of the nurse
Location - (x,y) location of the nurse Table below lists results for different methods utilizing different modalities, whereas our transformer based solution outperforms them all.

Sensors Used	Method	Validation Accuracy
Acceleration and Motion Capture (Ours)	Transformers	81.8
Motion Capture and Location	KNN	80.2
Motion Capture	ST-GCN	64.6
All Modalities	CNN	46.5
Acceleration	Random Forest	43.1
Motion Capture and Location	GRU	29.3

Graphs shown below reflect the effectivness of proposed solution. Pn right, he bar graph shows class Wise F1-score comparison with top two solutions posted for the nurse Activity Recogniton challenge dataset, STGCN and KNN. We can see for almost all classes our proposed solution out-performs the ST-GCN and hand-crafted feature based KNN method. On right, we have validation accuracy for all existing solutions as mentioned in table above.

Usage

Requirements

Create a conda environment and install dependencies from given requirements.txt.

conda create --name myenv python=3.6
conda env create -f Tools/mmt_env.yml

Training

Download the data and put the path of acceleration and skeletal joints data and labels in the config file. Simply run the following command to train the crossview fusion model on the NurseCareActivityRecognition dataset.

python3 train_ncrc.py

Note: For training another model, you can simply import relevant model in train_ncrc script.

Inference

For inference load desired chcekpoint and select a model name. For example for validation on NCRC data using CrossView fusion model, run. Where CKTP_PATH is the path to correspoding downloaded checkpoint model, and a valid model name can be

crossview_fusion_model
model_acc_only
model_skeleton_only
model_simple_fusion

python3 validation_ncrc.py --ckpt_path [CKPT PATH] --model 'crossview_fusion_model'

Citation

If you find this useful in your work, please give a ⭐ and consider citing:

@article{momal2022multimodal_transformer,
  title={Multimodal Transformer for Nurse Activity Recognition},
  author={Momal Ijaz, Renato Diaz ,Chen Chen},
  journal={arXiv preprint arXiv:2204.04564},
  year={2022}}

Momilijaz96/MMT_for_NCRC