wsstriving's Stars
ggerganov/whisper.cpp
Port of OpenAI's Whisper model in C/C++
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
mli/paper-reading
深度学习经典、新论文逐段精读
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
spotify/basic-pitch
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
openai/improved-diffusion
Release for Improved Denoising Diffusion Probabilistic Models
lucidrains/musiclm-pytorch
Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch
microsoft/torchscale
Foundation Architecture for (M)LLMs
enhuiz/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
haoheliu/AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
lucidrains/audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
lucidrains/lion-pytorch
🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch
WenzheLiu-Speech/awesome-speech-enhancement
speech enhancement\speech seperation\sound source localization
microsoft/tutel
Tutel MoE: An Optimized Mixture-of-Experts Implementation
magenta/ddsp-vst
Realtime DDSP Neural Synthesizer and Effect
FuxiVirtualHuman/styletalk
MasayaKawamura/MB-iSTFT-VITS
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
mpariente/pystoi
Python implementation of the Short Term Objective Intelligibility measure
zhangyongmao/VISinger2
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
haoheliu/audioldm_eval
This toolbox aims to unify audio generation model evaluation for easier comparison.
interactiveaudiolab/penn
Pitch Estimating Neural Networks (PENN)
adobe-research/convmelspec
Convmelspec: Convertible Melspectrograms via 1D Convolutions
tango4j/Auto-Tuning-Spectral-Clustering
This repo is for the SPL paper "Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap"
desh2608/gss
A simple package for Guided source separation (GSS)
BUTSpeechFIT/EEND
fss1t/CausalStarGANv2-VC
Nathan-Roll1/PSST
Prosodic Speech Segmentation with Transformers