xiangkanghuang's Stars
openvpi/audio-slicer
Python script that slices audio with silence detection
BytedanceSpeech/seed-tts-eval
fixie-ai/ultravox
A fast multimodal LLM for real-time voice
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
fpaissan/tinyCLAP
Implementation of tinyCLAP.
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
facebookresearch/ears_dataset
Expressive Anechoic Recordings of Speech (EARS)
xincanfeng/vitsGPT
NVIDIA/audio-flamingo
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Hypotheses-Paradise/Hypo2Trans
Single-blind supplementary materials for NeurIPS 2023 submission
ex3ndr/supervoice-vall-e-2
VALL-E 2 reproduction
winddori2002/DEX-TTS
DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability
adelacvg/detail_tts
All generative model in one for better TTS model
ICDM-UESTC/DOSE
DOSE: Diffusion Dropout with Adaptive Prior for Speech Enhancement, Conference on Neural Information Processing Systems (NeurIPS), 2023
Nettech15/Daisy-Seed-7-Voice-VA-Synthesizer
Daisy Seed 7-Voice Synthesizer with USB-MIDI interface and Ladder LPF.
giulbia/baby_cry_detection
Recognition of baby cry audio signal
TUT-ARG/sed_vis
Visualization toolbox for Sound Event Detection
cwang621/blsp-emo
BLSP-Emo: Towards Empathetic Large Speech-Language Models
ZuodaoTech/everyone-can-use-english
人人都能用英语
Rongjiehuang/TranSpeech
PyTorch Implementation of TranSpeech (ICLR'23): Textless NAR Speech-to-Speech Translation with Bilateral Perturbation
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
modelscope/DiffSynth-Studio
Enjoy the magic of Diffusion models!
Yuan-ManX/ai-audio-datasets
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
CanCLID/ToJyutping
粵語拼音自動標註工具 Cantonese Pronunciation Automatic Labeling Tool
MusicLang/maidi
Work with symbolic music gen AI easily, based on midi manipulation.
groupmm/libsoni
libsoni: A Python Toolbox for Sonifying Music Annotations and Feature Representations
aik2mlj/polyffusion
Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls
jishengpeng/ControlSpeech
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
theodorblackbird/lina-speech
lina-speech : linear attention based text-to-speech
lucidrains/multimodal-dit-pytorch
Implementation of a multimodal diffusion transformer in Pytorch