xiangkanghuang's Stars
Text-to-Audio/AudioLCM
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
1Panel-dev/MaxKB
🚀 MaxKB 是一款基于大语言模型和 RAG 的开源知识库问答系统,广泛应用于智能客服、企业内部知识库、学术研究与教育等场景。
lucidrains/PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
Kwai-Kolors/Kolors
Kolors Team
KwaiVGI/LivePortrait
Bring portraits to life!
lucidrains/e2-tts-pytorch
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
ictnlp/NAST-S2x
A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.
tencent-ailab/persona-hub
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
ldzhangyx/instruct-MusicGen
The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning".
rany2/edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
sanderwood/melodyt5
MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]
maxrmorrison/promonet
Prosody and Pronunciation Modification Network
magpie-align/magpie
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
lzhangbj/ASVA
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
open-mmlab/FoleyCrafter
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
FunAudioLLM/FunAudioLLM-APP
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
ictnlp/ComSpeech
Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".
Labbeti/aac-datasets
Audio Captioning datasets for PyTorch.
kyegomez/AudioFlamingo
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"
modelscope/FunClip
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
dengcunqin/noise-reduction
noise reduction
openvpi/audio-slicer
Python script that slices audio with silence detection
BytedanceSpeech/seed-tts-eval
fixie-ai/ultravox
A fast multimodal LLM for real-time voice
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
fpaissan/tinyCLAP
Implementation of tinyCLAP.
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
facebookresearch/ears_dataset
Expressive Anechoic Recordings of Speech (EARS)