An introduction to our model for age and gender prediction based on wav2vec 2.0. The model is available from doi:10.5281/zenodo.7761387 and released under CC BY-NC-SA 4.0. The model was created by fine-tuning the pre-trained wav2vec2-large-robust model on aGender, Mozilla Common Voice, Timit and Voxceleb 2. We provide two variants of the model: one with all 24 transformer layers and a stripped-down version with six transformer layers. The models were exported to ONNX. The original Torch model is hosted at Hugging Face: 6 layers and 24 layers. Further details are given in the associated paper and notebook.
The model can be used for non-commercial purposes, see CC BY-NC-SA 4.0. For commercial usage, a license for devAIce must be obtained. The source code in this GitHub repository is released under the following license.
Create / activate Python virtual environment and install audonnx.
$ pip install audonnx
Load the model with six layers and test on random signal.
import audeer
import audonnx
import numpy as np
url = 'https://zenodo.org/record/7761387/files/w2v2-L-robust-6-age-gender.25c844af-1.1.1.zip'
cache_root = audeer.mkdir('cache')
model_root = audeer.mkdir('model')
archive_path = audeer.download_url(url, cache_root, verbose=True)
audeer.extract_archive(archive_path, model_root)
model = audonnx.load(model_root)
sampling_rate = 16000
signal = np.random.normal(size=sampling_rate).astype(np.float32)
model(signal, sampling_rate)
{'hidden_states': array([[ 0.02783544, 0.01402022, 0.03839185, ..., 0.00786646,
-0.09332313, 0.0915948 ]], dtype=float32),
'logits_age': array([[0.3961048]], dtype=float32),
'logits_gender': array([[ 0.32810774, -0.56528044, 0.0317882 ]], dtype=float32)}
The 'hidden_states' are the pooled states of the last transformer layer, 'logits_age' provides scores for age in a range of approximately 0...1 (== 100 years) and 'logits_gender' expresses the confidence for being female, male or child.
For a detailed introduction, please check out the notebook.
$ pip install -r requirements.txt
$ jupyter notebook notebook.ipynb
If you use our model in your own work, please cite the following paper:
@inproceedings{,
author = {Felix Burkhardt and Johannes Wagner and Hagen Wierstorf and Florian Eyben and Björn Schuller},
editor = {Peter Jax and Sebastian Mölller},
journal = {15th ITG conference on Speech Communication},
title = {Speech-based Age and Gender Prediction with Transformers},
year = {2023},
}