xiangkanghuang's Stars
Lightning-AI/LitServe
Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.
mush42/optispeech
A lightweight end-to-end text-to-speech model
yqzhishen/HarmonicNoiseSeparationGUI
A simple WebUI for harmonic-noise separation of vocals, using ONNXRuntime for inference.
RS2002/PianoBart
Official Repository for The Paper, PianoBART: Symbolic Piano Music Understanding and Generating with Large-Scale Pre-Training
mush42/istft-onnx
Export an ONNX graph that performs ISTFT. Designed for TTS models.
facebookresearch/MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
gwh22/LAFMA
Text-to-Audio/AudioLCM
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
1Panel-dev/MaxKB
🚀 基于大语言模型和 RAG 的知识库问答系统。开箱即用、模型中立、灵活编排,支持快速嵌入到第三方业务系统。
lucidrains/PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
Kwai-Kolors/Kolors
Kolors Team
KwaiVGI/LivePortrait
Bring portraits to life!
lucidrains/e2-tts-pytorch
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
ictnlp/NAST-S2x
A fast speech-to-any translation model that supports simultaneous decoding and offers 28× speedup.
tencent-ailab/persona-hub
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
ldzhangyx/instruct-MusicGen
The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning".
rany2/edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
sanderwood/melodyt5
MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]
maxrmorrison/promonet
Prosody and Pronunciation Modification Network
magpie-align/magpie
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
lzhangbj/ASVA
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
open-mmlab/FoleyCrafter
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
FunAudioLLM/FunAudioLLM-APP
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
ictnlp/ComSpeech
Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".
Labbeti/aac-datasets
Audio Captioning datasets for PyTorch.
kyegomez/AudioFlamingo
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"
modelscope/FunClip
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
dengcunqin/noise-reduction
noise reduction