wsstriving's Stars
2noise/ChatTTS
A generative speech model for daily dialogue.
kenjihiranabe/The-Art-of-Linear-Algebra
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
openai/tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
chenzomi12/AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Zejun-Yang/AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Camb-ai/MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
bshall/knn-vc
Voice Conversion With Just Nearest Neighbors
karaokenerds/python-audio-separator
Easy to use vocal separation from CLI or as a python package, using a variety of amazing models (primarily trained by @Anjok07 as part of UVR)
KdaiP/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
huangwb8/ChineseResearchLaTeX
**科研常用LaTeX模板集
quickvc/QuickVC-VoiceConversion
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
wavmark/wavmark
AI-based Audio Watermarking Tool
zhenye234/CoMoSpeech
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Grace9994/CoMoSVC
CoMoSVC: One-Step Consistency Model Based Singing Voice Conversion & Singing Voice Clone
Vincent-ZHQ/CA-MSER
Code for Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information
thuhcsi/SECap
line/LibriTTS-P
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
yukara-ikemiya/friendly-stable-audio-tools
Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
flinkerlab/neural_speech_decoding
liyunlongaaa/NSD-MS2S
CHIME-7 diarization champion system: neural speaker diarization using memory-aware multi-speaker embedding with sequence-to-sequence architecture
DigitalPhonetics/speaker-anonymization
Speaker anonymization pipeline for hiding the identity of the speaker of a recording by changing the voice in it.
RicherMans/Dasheng
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
ARDiT-TTS/ardit-tts.github.io
xjchenGit/SingGraph
Official repository for the paper Singing Voice Graph Modeling for SingFake Detection (Interspeech 2024).
npuichigo/snake
Data loading with combined async Rust stream and Python