Pinned Repositories
AIR-Bench
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
allosaurus
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
arctic_shift
Making Reddit data accessible to researchers, moderators and everyone else. Interact with the data through large dumps, an API or web interface.
asr2k
asr2k
audio-dataset
Audio Dataset for training CLAP and other models
audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
audio-flamingo
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
audio-slicer
A simple GUI application that slices audio with silence detection
AudioGPT
GuangkeChen's Repositories
GuangkeChen/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
GuangkeChen/arctic_shift
Making Reddit data accessible to researchers, moderators and everyone else. Interact with the data through large dumps, an API or web interface.
GuangkeChen/audio-flamingo
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
GuangkeChen/bc-omni
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊
GuangkeChen/ChatTTS
A generative speech model for daily dialogue.
GuangkeChen/CleanS2S
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
GuangkeChen/DDSP-SVC
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
GuangkeChen/edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
GuangkeChen/EzAudio
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
GuangkeChen/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
GuangkeChen/Freeze-Omni
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
GuangkeChen/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
GuangkeChen/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
GuangkeChen/GTSinger
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
GuangkeChen/GuangkeChen.github.io
The resources of my website
GuangkeChen/hertz-dev
first base model for full-duplex conversational audio
GuangkeChen/ichigo
Llama3.1 learns to Listen
GuangkeChen/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
GuangkeChen/mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
GuangkeChen/MNP-SVC
Real-time end-to-end singing voice convertion
GuangkeChen/Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
GuangkeChen/nanoGCG
A fast + lightweight implementation of the GCG algorithm in PyTorch
GuangkeChen/openvino_notebooks
📚 Jupyter notebook tutorials for OpenVINO™
GuangkeChen/Pangea
This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"
GuangkeChen/Parrot-TTS
Official Code for ParrotTTS
GuangkeChen/Parselmouth
Praat in Python, the Pythonic way
GuangkeChen/praat
Praat: Doing Phonetics By Computer
GuangkeChen/so-vits-svc-fork
so-vits-svc fork with realtime support, improved interface and more features.
GuangkeChen/spiritlm
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
GuangkeChen/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities