sunxh16's Stars
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
xingchensong/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
dukGuo/valle-audiodec
Inference code for Audiodec-Valle-Wenetspeech4TTS
hhguo/SoCodec
X-LANCE/PaperReading
整理各研究方向经典论文
cantabile-kwok/vec2wav2.0
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
thuhcsi/SpeechCraft
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
zhenye234/xcodec
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Aria-K-Alethia/BigCodec
OpenT2S/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
AbrahamSanders/codec-bpe
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
xai-org/grok-1
Grok open release
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
lucidrains/transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
LlamaFamily/Llama-Chinese
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
meta-llama/llama
Inference code for Llama models
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
OpenMOSS/AnyGPT
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
YoMio-Tech-Inc/GPT-SoVITS2
GPT-SoVITS2
black-forest-labs/flux
Official inference repo for FLUX.1 models
DigitalPhonetics/IMS-Toucan
Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
rasbt/LLMs-from-scratch
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
meta-llama/llama-models
Utilities intended for use with Llama models.
sarulab-speech/UTMOSv2
UTokyo-SaruLab MOS Prediction System
lucidrains/autoregressive-diffusion-pytorch
Implementation of Autoregressive Diffusion in Pytorch
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
AudioLLMs/AudioBench
AudioBench: A Universal Benchmark for Audio Large Language Models