/Voice-Based-Age-and-Gender-Recogniton

[ICTC-2024] - "Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture" by Nhut Minh Nguyen, Thanh Trung Nguyen, Hua Hiep Nguyen, Phuong-Nam Tran, Duc Ngoc Minh Dang

Primary LanguagePythonMIT LicenseMIT

Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet, and Hybrid CNNs-BiLSTM Architecture

Official implementation for the paper: Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture. The paper has been accepted to The 15th International Conference on ICT Convergence (ICTC2024).

Please press ⭐ button and/or cite papers if you feel helpful.

license

python pytorch cuda IEEE

AbstractUsageReferencesContact

Abstract

In this study, we compared three architectures for the task of age and gender recognition from voice data: Long Short-Term Memory networks (LSTM), Hybrid of Convolutional Neural Networks Bidirectional Long Short-Term Memory (CNNs-BiLSTM), and the recently released RezoNet architecture. The dataset used in the study is sourced from Mozilla Common Voice in Japanese. Features such as pitch, magnitude, Mel-frequency cepstral coefficients (MFCCs), and filter-bank energies were extracted from the voice data for signal processing, and three architectures were evaluated. Our evaluation revealed that LSTM was slightly less accurate than RezoNet (83.1%), with hybrid CNNs-BiLSTM (93.1%) and LSTM achieving the best accuracy for gender recognition (93.5%). However, hybrid CNNs-BiLSTM architecture outperformed the other models in age recognition, with an accuracy of 69.75%, compared to 64.25% and 44.88% for LSTM and RezoNet, respectively. Using Japanese language data and the extracted characteristics, the hybrid CNNs-BiLSTM architecture model demonstrated the highest accuracy in both tests, highlighting its efficacy in voice-based age and gender detection. These results suggest promising avenues for future research and practical applications in this field.

Index Terms: Voice-Based Age and Gender Recognition, RezoNet, Convolutional Neural Network, Long Short-Term Memory, Bidirectional Long-Term Memory, Deep Learning.

Usage

Dataset

In this study, we use voice dataset from Mozilla Comman Voice.

Download in here

Clone this repository

git clone "https://github.com/nhut-ngnn/Voice-Based-Age-and-Gender-Recogniton.git"

Create Conda Enviroment and Install Requirement

conda create -n Voice-Based-Age-and-Gender-Recogniton python=3.10 -y
conda activate Voice-Based-Age-and-Gender-Recogniton
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

References

@INPROCEEDINGS{nguyen2024age-gender,
  author={Nguyen, Nhut Minh and Nguyen, Thanh Trung and Nguyen, Hua Hiep and Tran, Phuong-Nam and Dang, Duc Ngoc Minh},
  booktitle={2024 15th International Conference on Information and Communication Technology Convergence (ICTC)}, 
  title={Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture}, 
  year={2024},
  volume={},
  number={},
  pages={191-196},
  keywords={Long short term memory;Voice-Based Age and Gender Recognition;RezoNet;Convolutional Neural Network;Long Short-Term Memory;Bidirectional Long-Term Memory;Deep Learning},
  doi={10.1109/ICTC62082.2024.10827387}}

Contact

For any information, please contact the main author:

Nhut Minh Nguyen at FPT University, Vietnam
Email: minhnhut.ngnn@gmail.com
GitHub: https://github.com/nhut-ngnn
ORCID: https://orcid.org/0009-0003-1281-5346