zshy1205

zshy1205's Stars

NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python12.9k2.6k
UFund-Me/Qbot
[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant
Language:Jupyter Notebook8.9k1.2k
ShiqiYu/libfacedetection
An open source library for face detection in images. The face detection speed can reach 1000FPS.
Language:C++12.4k3.1k
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook13.6k1.3k
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook48.5k5.7k
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language:Python13.5k1.1k
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek3, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
Language:Python5.1k442
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Language:Python4.8k461
lancedb/lance-deeplearning-recipes
Deep Learning how-to's using Lance file format
Language:Python155
MoonInTheRiver/DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
Language:Python4.4k724
ina-foss/inaSpeechSegmenter
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Language:Python774133
jrgillick/laughter-detection
Language:Python24450
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
Language:Python2.8k231
TMElyralab/MuseTalk
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
Language:Python3.3k418
wenet-e2e/wespeaker
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Language:Python799126
huggingface/dataspeech
Language:Python32552
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
Language:Python9.2k1.4k
clovaai/voxceleb_trainer
In defence of metric learning for speaker recognition
Language:Python1.1k276
fishaudio/fish-diffusion
An easy to understand TTS / SVS / SVC framework
Language:Python67886
neonbjb/tts-scores
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
Language:Python14715
152334H/tortoise-tts-fast
Fast TorToiSe inference (5x or your money back!)
Language:Jupyter Notebook799179
adelacvg/ttts
Train the next generation of TTS systems.
Language:Python16217
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Language:Python36.9k4.6k
hugofloresgarcia/vampnet
music generation with masked transformers!
Language:Python31637
DLLXW/baby-llama2-chinese
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库；24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
Language:Python2.6k322
pseeth/argbind
Simple package for binding functions to CLI or config files.
Language:Python434
lucidrains/audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Language:Python2.5k269
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Language:Python1.3k120
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python15.1k1.4k
innnky/ar-vits
text to speech using autoregressive transformer and VITS
Language:Python23417