zshy1205's Stars
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
UFund-Me/Qbot
[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant
ShiqiYu/libfacedetection
An open source library for face detection in images. The face detection speed can reach 1000FPS.
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek3, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
lancedb/lance-deeplearning-recipes
Deep Learning how-to's using Lance file format
MoonInTheRiver/DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
ina-foss/inaSpeechSegmenter
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
jrgillick/laughter-detection
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
TMElyralab/MuseTalk
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
wenet-e2e/wespeaker
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
huggingface/dataspeech
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
clovaai/voxceleb_trainer
In defence of metric learning for speaker recognition
fishaudio/fish-diffusion
An easy to understand TTS / SVS / SVC framework
neonbjb/tts-scores
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
152334H/tortoise-tts-fast
Fast TorToiSe inference (5x or your money back!)
adelacvg/ttts
Train the next generation of TTS systems.
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
hugofloresgarcia/vampnet
music generation with masked transformers!
DLLXW/baby-llama2-chinese
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
pseeth/argbind
Simple package for binding functions to CLI or config files.
lucidrains/audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
innnky/ar-vits
text to speech using autoregressive transformer and VITS