Hierarchical Windowed Graph Attention Network (HWGAT) is a deep learning model specifically designed for sign language recognition. This model leverages hierarchical and windowed attention mechanisms to effectively capture the temporal and spatial dependencies in sign language skeleton data. This repository includes a comprehensive implementation of HWGAT, covering data preprocessing and the full training pipeline.
To get started with HWGAT for sign language recognition, follow these steps:
-
Clone the repository:
git clone https://github.com/arkadip-maitra/sl-hwgat.git cd sl-hwgat/
-
Install Docker and NVIDIA Container Toolkit.
Install docker by following this tutorial. Then install the NVIDIA Container Toolkit.
-
Create a docker instance with the
Dockerfile
and run the container and attach correct path. -
Install the required dependencies with
pip install -r requirements.txt
.
Go to main directory to run the code
cd hwgat/
The data preprocessing pipeline prepares the raw sign language data for training.
-
Generate metadata: Ensure your dataset is structured properly and run the metadata generator scripts with correspoding dataset. After this, one metadata must be generated to run the deep learning pipeline with the following command. If you are using different dataset then make our own mata generator.
python meta_generators/FDMSE_meta_gen.py
!!!Note: Remember to update the paths inside every meta generator script.
This should generate a file in '/data/datasets/FDMSE/FDMSE_meta/metadata.csv'.
-
Generate keypoints: Extract keypoints and save them using the
pose_feature_extract.py
file by running the following command, where--root
: root directory of the dataset,--meta
: dataset's metadata.csv,--out_path
: saving path of the outputs (keypoints) (the folder will be created under the root directory).python pose_feature_extract.py --root '/data/datasets/FDMSE' --meta '/data/datasets/FDMSE/FDMSE_meta/metadata.csv' -m mediapipe --out_path 'mediapipe_out/'
-
Process keypoints data: Next preprocess the generated keypoints so that it can be used to trained the transformer based model using the following command, where
--ds
: dataset name,--root
: root directory of the dataset,--meta
: dataset's metadata.csv,-dr
: keypoints output relative path from the root-ft
: feature type that is extracted.python data_preprocess.py --root /data/datasets/FDMSE/ --ds FDMSE --meta /data/datasets/FDMSE/FDMSE_meta/metadata.csv -dr mediapipe_out/ -ft keypoints
Once the data is preprocessed, you can train the HWGAT model using the training pipeline provided.
-
Configure the training parameters: Edit the
configs.py
file to set your training parameters, such as learning rate, batch size, number of epochs, etc. -
Training the model: Start the training process of the model by running
python main.py -m train -d FDMSE --model HWGAT -p mediapipe
-
Testing the model: Test the model using
python main.py -m test -d FDMSE --model HWGAT -p mediapipe -t 240227_1807 -px best_loss
-
Load and train the model: Load and train the model or finetune on different datasets using
-
Load and train on same dataset.
python main.py -m load -d FDMSE --model HWGAT -p mediapipe -t 240227_1807 -px best_loss
-
Finetune on other dataset.
python main.py -m load -d INCLUDE --model HWGAT -p mediapipe -mw output/FDMSE/HWGAT_240227_1807/model_best_loss.pt
-
Go to this repository to get the demo application the HWGAT model for sign language recognition tasks.
This project is licensed under the MIT License - see the LICENSE file for details.
If you find this project useful in your research, please cite using:
@misc{patra2024hierarchicalwindowedgraphattention,
title={Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition},
author={Suvajit Patra and Arkadip Maitra and Megha Tiwari and K. Kumaran and Swathy Prabhu and Swami Punyeshwarananda and Soumitra Samanta},
year={2024},
eprint={2407.14224},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.14224},
}
Thank you for using this repository. For any questions or support, please open an issue in this repository.