[CONTRIBUTION] Speech Dataset Generator
davidmartinrius opened this issue · 0 comments
Hi everyone!
My name is David Martin Rius and I have just published this project on GitHub: https://github.com/davidmartinrius/speech-dataset-generator/
Now you can create datasets automatically with any audio or lists of audios.
I hope you find it useful.
Here are the key functionalities of the project:
-
Dataset Generation: Creation of multilingual datasets with Mean Opinion Score (MOS).
-
Silence Removal: It includes a feature to remove silences from audio files, enhancing the overall quality.
-
Sound Quality Improvement: It improves the quality of the audio when needed.
-
Audio Segmentation: It can segment audio files within specified second ranges.
-
Transcription: The project transcribes the segmented audio, providing a textual representation.
-
Gender Identification: It identifies the gender of each speaker in the audio.
-
Pyannote Embeddings: Utilizes pyannote embeddings for speaker detection across multiple audio files.
-
Automatic Speaker Naming: Automatically assigns names to speakers detected in multiple audios.
-
Multiple Speaker Detection: Capable of detecting multiple speakers within each audio file.
-
Store speaker embeddings: The speakers are detected and stored in a Chroma database, so you do not need to assign a speaker name.
-
Syllabic and words-per-minute metrics
Feel free to explore the project at https://github.com/davidmartinrius/speech-dataset-generator