xiangkanghuang

xiangkanghuang's Stars

Text-to-Audio/AudioLCM
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
Language:Python1.1k179
1Panel-dev/MaxKB
🚀 MaxKB 是一款基于大语言模型和 RAG 的开源知识库问答系统，广泛应用于智能客服、企业内部知识库、学术研究与教育等场景。
Language:Python11.4k1.5k
lucidrains/PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
Language:Python1113
Kwai-Kolors/Kolors
Kolors Team
Language:Python3.8k264
KwaiVGI/LivePortrait
Bring portraits to life!
Language:Python12.9k1.4k
lucidrains/e2-tts-pytorch
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
Language:Python34033
ictnlp/NAST-S2x
A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.
Language:Python604
tencent-ailab/persona-hub
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
Language:Python87561
ldzhangyx/instruct-MusicGen
The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning".
Language:Python713
rany2/edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Language:Python6.2k617
sanderwood/melodyt5
MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]
Language:Python371
maxrmorrison/promonet
Prosody and Pronunciation Modification Network
Language:Python436
magpie-align/magpie
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
Language:Python48053
lzhangbj/ASVA
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
Language:Python34
open-mmlab/FoleyCrafter
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师，给你的无声视频添加生动而且同步的音效 😝
Language:Python46039
FunAudioLLM/FunAudioLLM-APP
Language:Python28651
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Language:Python3.4k308
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python6.2k665
ictnlp/ComSpeech
Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".
Language:Python236
Labbeti/aac-datasets
Audio Captioning datasets for PyTorch.
Language:Python1066
kyegomez/AudioFlamingo
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"
Language:Python391
modelscope/FunClip
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Language:Python3.6k395
dengcunqin/noise-reduction
noise reduction
Language:Python173
openvpi/audio-slicer
Python script that slices audio with silence detection
Language:Python774270
BytedanceSpeech/seed-tts-eval
Language:Python1k104
fixie-ai/ultravox
A fast multimodal LLM for real-time voice
Language:Python98058
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python8.6k614
fpaissan/tinyCLAP
Implementation of tinyCLAP.
Language:Python231
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language:Python12.4k1k
facebookresearch/ears_dataset
Expressive Anechoic Recordings of Speech (EARS)
Language:Python1307