xiangkanghuang

xiangkanghuang's Stars

openvpi/audio-slicer
Python script that slices audio with silence detection
Language:Python754265
BytedanceSpeech/seed-tts-eval
Language:Python94497
fixie-ai/ultravox
A fast multimodal LLM for real-time voice
Language:Python85846
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python8.4k593
fpaissan/tinyCLAP
Implementation of tinyCLAP.
Language:Python211
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language:Python11.5k953
facebookresearch/ears_dataset
Expressive Anechoic Recordings of Speech (EARS)
Language:Python1257
xincanfeng/vitsGPT
Language:Python395
NVIDIA/audio-flamingo
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Language:Python1709
Hypotheses-Paradise/Hypo2Trans
Single-blind supplementary materials for NeurIPS 2023 submission
Language:Python564
ex3ndr/supervoice-vall-e-2
VALL-E 2 reproduction
Language:Jupyter Notebook7211
winddori2002/DEX-TTS
DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability
Language:Python796
adelacvg/detail_tts
All generative model in one for better TTS model
Language:Python648
ICDM-UESTC/DOSE
DOSE: Diffusion Dropout with Adaptive Prior for Speech Enhancement, Conference on Neural Information Processing Systems (NeurIPS), 2023
Language:Python39
Nettech15/Daisy-Seed-7-Voice-VA-Synthesizer
Daisy Seed 7-Voice Synthesizer with USB-MIDI interface and Ladder LPF.
Language:C++436
giulbia/baby_cry_detection
Recognition of baby cry audio signal
Language:Python250116
TUT-ARG/sed_vis
Visualization toolbox for Sound Event Detection
Language:Python11329
cwang621/blsp-emo
BLSP-Emo: Towards Empathetic Large Speech-Language Models
Language:Python332
ZuodaoTech/everyone-can-use-english
人人都能用英语
Language:TypeScript24.4k3.7k
Rongjiehuang/TranSpeech
PyTorch Implementation of TranSpeech (ICLR'23): Textless NAR Speech-to-Speech Translation with Bilateral Perturbation
Language:Python16923
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python34.8k4.1k
modelscope/DiffSynth-Studio
Enjoy the magic of Diffusion models!
Language:Python6.4k572
Yuan-ManX/ai-audio-datasets
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
46333
CanCLID/ToJyutping
粵語拼音自動標註工具 Cantonese Pronunciation Automatic Labeling Tool
Language:Python576
MusicLang/maidi
Work with symbolic music gen AI easily, based on midi manipulation.
Language:Python18
groupmm/libsoni
libsoni: A Python Toolbox for Sonifying Music Annotations and Feature Representations
Language:Jupyter Notebook173
aik2mlj/polyffusion
Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls
Language:Python728
jishengpeng/ControlSpeech
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
Language:Python1816
theodorblackbird/lina-speech
lina-speech : linear attention based text-to-speech
Language:Jupyter Notebook1159
lucidrains/multimodal-dit-pytorch
Implementation of a multimodal diffusion transformer in Pytorch
92