Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet, and Hybrid CNNs-BiLSTM Architecture

Official implementation for the paper: Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture. The paper has been accepted to The 15th International Conference on ICT Convergence (ICTC2024).

Please press ⭐ button and/or cite papers if you feel helpful.

Abstract • Usage • References • Contact

Abstract

In this study, we compared three architectures for the task of age and gender recognition from voice data: Long Short-Term Memory networks (LSTM), Hybrid of Convolutional Neural Networks Bidirectional Long Short-Term Memory (CNNs-BiLSTM), and the recently released RezoNet architecture. The dataset used in the study is sourced from Mozilla Common Voice in Japanese. Features such as pitch, magnitude, Mel-frequency cepstral coefficients (MFCCs), and filter-bank energies were extracted from the voice data for signal processing, and three architectures were evaluated. Our evaluation revealed that LSTM was slightly less accurate than RezoNet (83.1%), with hybrid CNNs-BiLSTM (93.1%) and LSTM achieving the best accuracy for gender recognition (93.5%). However, hybrid CNNs-BiLSTM architecture outperformed the other models in age recognition, with an accuracy of 69.75%, compared to 64.25% and 44.88% for LSTM and RezoNet, respectively. Using Japanese language data and the extracted characteristics, the hybrid CNNs-BiLSTM architecture model demonstrated the highest accuracy in both tests, highlighting its efficacy in voice-based age and gender detection. These results suggest promising avenues for future research and practical applications in this field.

Index Terms: Voice-Based Age and Gender Recognition, RezoNet, Convolutional Neural Network, Long Short-Term Memory, Bidirectional Long-Term Memory, Deep Learning.

Usage

Dataset

In this study, we use voice dataset from Mozilla Comman Voice.

Download in here

Clone this repository

git clone "https://github.com/nhut-ngnn/Voice-Based-Age-and-Gender-Recogniton.git"

Create Conda Enviroment and Install Requirement

conda create -n Voice-Based-Age-and-Gender-Recogniton python=3.10 -y
conda activate Voice-Based-Age-and-Gender-Recogniton
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

References

@INPROCEEDINGS{nguyen2024age-gender,
  author={Nguyen, Nhut Minh and Nguyen, Thanh Trung and Nguyen, Hua Hiep and Tran, Phuong-Nam and Dang, Duc Ngoc Minh},
  booktitle={2024 15th International Conference on Information and Communication Technology Convergence (ICTC)}, 
  title={Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture}, 
  year={2024},
  volume={},
  number={},
  pages={191-196},
  keywords={Long short term memory;Voice-Based Age and Gender Recognition;RezoNet;Convolutional Neural Network;Long Short-Term Memory;Bidirectional Long-Term Memory;Deep Learning},
  doi={10.1109/ICTC62082.2024.10827387}}

Contact

For any information, please contact the main author:

Nhut Minh Nguyen at FPT University, Vietnam
Email: minhnhut.ngnn@gmail.com
GitHub: https://github.com/nhut-ngnn
ORCID: https://orcid.org/0009-0003-1281-5346