We release the Expressive Anechoic Recordings of Speech (EARS) dataset.
If you use the dataset or any derivative of it, please cite our Paper
@inproceedings{richter2024ears,
title={{EARS}: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation},
author={Richter, Julius and Wu, Yi-Chiao and Krenn, Steven and Welker, Simon and Lay, Bunlong and Watanabe, Shinjii and Richard, Alexander and Gerkmann, Timo},
booktitle={Interspeech},
year={2024}
}
For audio samples, visit the project page.
- 100 h of speech data from 107 speakers
- high-quality recordings at 48 kHz in an anechoic chamber
- high speaker diversity with speakers from different ethnicities and age range from 18 to 75 years
- full dynamic range of human speech, ranging from whispering to yelling
- 18 minutes of freeform monologues per speaker
- sentence reading in 7 different reading styles (regular, loud, whisper, high pitch, low pitch, fast, slow)
- emotional reading and freeform tasks covering 22 different emotions for each speaker
for X in $(seq -w 001 107); do
curl -L https://github.com/facebookresearch/ears_dataset/releases/download/dataset/p${X}.zip -o p${X}.zip
unzip p${X}.zip
rm p${X}.zip
done
run the EARS download script
python download_ears.py
curl -L https://github.com/facebookresearch/ears_dataset/releases/download/blind_testset/blind_testset.zip -o blind_testset.zip
mkdir blind_testset
unzip blind_testset.zip -d blind_testset
rm blind_testset.zip
run the blind testset download script
python download_blind_testset.py
The speaker statistics (age, ethnicity, gender, weight, height, native language) for the 107 speakers are collected in speaker_statistics.json.
Transcripts of the reading portions of the dataset are available in transcripts.json.
The code and dataset are released under CC-NC 4.0 International license.