wsstriving's Stars
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
openai/tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
XuehaiPan/nvitop
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
togethercomputer/RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Jittor/jittor
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
haoheliu/AudioLDM2
Text-to-Audio/Music Generation
lucidrains/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
gemelo-ai/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
wenet-e2e/wespeaker
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
DmitryRyumin/INTERSPEECH-2023-24-Papers
INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
LAION-AI/audio-dataset
Audio Dataset for training CLAP and other models
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Zain-Jiang/Speech-Editing-Toolkit
It's a repository for implementations of neural speech editing algorithms.
zhenye234/CoMoSpeech
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
mct10/RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
Grace9994/CoMoSVC
CoMoSVC: One-Step Consistency Model Based Singing Voice Conversion & Singing Voice Clone
X-LANCE/UniCATS-CTX-vec2wav
[AAAI 2024] Code for CTX-vec2wav in UniCATS
X-LANCE/UniCATS-CTX-txt2vec
[AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS
DigitalPhonetics/speaker-anonymization
Speaker anonymization pipeline for hiding the identity of the speaker of a recording by changing the voice in it.
pengzhendong/pyannote-onnx
ONNX Inference of Pyannote Segmentation
TomJwYu/WenetSpeechSpeakerCluster
yuguochencuc/SF-Net
The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"
HuangZiliAndy/SSL_for_multitalker
ADAPTING SELF-SUPERVISED MODELS TO MULTI-TALKER SPEECH RECOGNITION USING SPEAKER EMBEDDINGS
nan-bean/dcnn-sv
An implementation of dynamic convolution algorithm based system for speaker verification.
npuichigo/snake
Data loading with combined async Rust stream and Python