1012-Hours-Indian-English-Speech-Data-by-Mobile-Phone

Description

Indian English audio data captured by mobile phones, 1,012 hours in total, recorded by 2,100 Indian native speakers. The recorded text is designed by linguistic experts, covering generic, interactive, on-board, home and other categories. The text has been proofread manually with high accuracy; this data set can be used for automatic speech recognition, machine translation, and voiceprint recognition.

For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/940?source=Github

Format

16kHz, 16bit, uncompressed wav, mono channel

Recording environment

quiet indoor environment, low background noise, without echo

Recording content (read speech)

generic category; human-machine interaction category; smart home command and control category; in-car command and control category; numbers

Demographics

2,100 speakers totally, with 52% males and 48% females; and 81% speakers of all are in the age group of 18-25,18% speakers of all are in the age group of 26-45, 1% speakers of all are in the age group of 46-60

Device

Android mobile phone, iPhone

Language

India English

Application scenario

speech recognition, voiceprint recognition

Licensing Information

Commercial License

Nexdata-AI/1012-Hours-Indian-English-Speech-Data-by-Mobile-Phone