ClearerVoice-Studio: A Python repository from Qoboty

ClearerVoice-Studio is an open-source, AI-powered speech processing toolkit designed for researchers, developers, and end-users. It provides capabilities of speech enhancement, speech separation, target speaker extraction, and more. The toolkit provides state-of-the-art pre-trained models, along with training and inference scripts, all accessible from this repository.

👉🏻ClearVoice Demo👈🏻 | 👉🏻SpeechScore Demo👈🏻

Please support our community project 💖 by starring it on GitHub 加⭐支持 🙏

News 🔥

[2024.11] FRCRN speech denoiser has been used over 2.8 million times on ModelScope
[2024.11] MossFormer speech separator has been used over 2.5 million times on ModelScope
[2024.11] Release of this repository
Upcoming: More tasks will be added to ClearVoice.

🌟 Why Choose ClearerVoice-Studio?

Pre-Trained Models: Includes cutting-edge pre-trained models, fine-tuned on extensive, high-quality datasets. No need to start from scratch!
Ease of Use: Designed for seamless integration with your projects, offering a simple yet flexible interface for inference and training.
Comprehensive Features: Combines advanced algorithms for multiple speech processing tasks in one platform.
Community-Driven: Built for researchers, developers, and enthusiasts to collaborate and innovate together.

Contents of this repository

This repository is organized into three main components: ClearVoice, Train, and SpeechScore.

1. ClearVoice

ClearVoice offers a user-friendly solution for speech processing tasks such as speech denoising, separation, audio-visual target speaker extraction, and more. It is designed as a unified inference platform leveraged pre-trained models (e.g., FRCRN, MossFormer), all trained on extensive datasets. If you're looking for a tool to improve speech quality, ClearVoice is the perfect choice. Simply click on ClearVoice and follow our detailed instructions to get started.

2. Train

For advanced researchers and developers, we provide model finetune and training scripts for all the tasks offerred in ClearVoice and more:

Task 1: Speech enhancement (16kHz & 48kHz)
Task 2: Speech separation (8kHz & 16kHz)
Task 3: Target speaker extraction (16kHz)
- Sub-Task 1: Audio-only Speaker Extraction Conditioned on a Reference Speech
- Sub-Task 2: Audio-visual Speaker Extraction Conditioned on Face (Lip) Recording
- Sub-Task 3: Audio-visual Speaker Extraction Conditioned on Body Gestures
- Sub-Task 4: Neuro-steered Speaker Extraction Conditioned on EEG Signals

Contributors are welcomed to include more model architectures and tasks!

3. SpeechScore

SpeechScore is a speech quality assessment toolkit. We include it here to evaluate different model performance. SpeechScore includes many popular speech metrics:

Signal-to-Noise Ratio (SNR)
Perceptual Evaluation of Speech Quality (PESQ)
Short-Time Objective Intelligibility (STOI)
Deep Noise Suppression Mean Opinion Score (DNSMOS)
Scale-Invariant Signal-to-Distortion Ratio (SI-SDR)
and many more quality benchmarks

Contact

If you have any comments or questions about ClearerVoice-Studio, feel free to raise an issue in this repository or contact us directly at:

email: {shengkui.zhao, zexu.pan}@alibaba-inc.com

Alternatively, welcome to join our DingTalk and WeChat groups to share and discuss algorithms, technology, and user experience feedback. You may scan the following QR codes to join our official chat groups accordingly.