Pinned Repositories
cLDM-DCL
ClearerVoice-Studio
ClearVoice
CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
D2Former
This repository contains the audio samples for "D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement" which is submitted to ICASSP 2023.
fig_resources
FRCRN
GatedFormer
This is the repository for the speech enhancement model SyncFormer
MossFormer
This repo provides the processed samples of the manuscript "MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-head Transformer with Convolution-augmented Joint Self-Attentions", which was submitted to ICASSP 2023.
MossFormer2
This is the audio sample repository for speech separation model "MossFormer2".
ClearerVoice-Studio
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
alibabasglab's Repositories
alibabasglab/FRCRN
alibabasglab/MossFormer2
This is the audio sample repository for speech separation model "MossFormer2".
alibabasglab/MossFormer
This repo provides the processed samples of the manuscript "MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-head Transformer with Convolution-augmented Joint Self-Attentions", which was submitted to ICASSP 2023.
alibabasglab/D2Former
This repository contains the audio samples for "D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement" which is submitted to ICASSP 2023.
alibabasglab/GatedFormer
This is the repository for the speech enhancement model SyncFormer
alibabasglab/cLDM-DCL
alibabasglab/fig_resources
alibabasglab/ClearerVoice-Studio
ClearVoice
alibabasglab/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
alibabasglab/FLASH-pytorch
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
alibabasglab/MS-SNSD
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
alibabasglab/speechbrain
A PyTorch-based Speech Toolkit
alibabasglab/TAC
transform-average-concatenate (TAC) method for end-to-end microphone permutation and number invariant ad-hoc beamforming.
alibabasglab/tts
Bilingual and Code-Switching Speech Synthesis
alibabasglab/vc
cross-lingual voice conversion