sunxh16

sunxh16's Stars

ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language:Python80952
xingchensong/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Language:Python557
dukGuo/valle-audiodec
Inference code for Audiodec-Valle-Wenetspeech4TTS
Language:Python432
hhguo/SoCodec
13
X-LANCE/PaperReading
整理各研究方向经典论文
9
cantabile-kwok/vec2wav2.0
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
16
thuhcsi/SpeechCraft
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
22
zhenye234/xcodec
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Language:Python743
Aria-K-Alethia/BigCodec
Language:Python362
OpenT2S/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
Language:Python16810
AbrahamSanders/codec-bpe
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
Language:Python333
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
Language:Python98577
xai-org/grok-1
Grok open release
Language:Python49.4k8.3k
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python2.4k240
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Language:Python65134
lucidrains/transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Language:Python51617
X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
Language:Python49339
LlamaFamily/Llama-Chinese
Llama中文社区，Llama3在线体验和微调模型已开放，实时汇总最新Llama3学习资料，已将所有代码更新适配Llama3，构建最好的中文Llama大模型，完全开源可商用
Language:Python13.6k1.2k
meta-llama/llama
Inference code for Llama models
Language:Python55.5k9.5k
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Language:Python3k314
OpenMOSS/AnyGPT
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
Language:Python74257
YoMio-Tech-Inc/GPT-SoVITS2
GPT-SoVITS2
Language:Python16610
black-forest-labs/flux
Official inference repo for FLUX.1 models
Language:Python13.6k961
DigitalPhonetics/IMS-Toucan
Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
Language:Python1.4k154
rasbt/LLMs-from-scratch
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
Language:Jupyter Notebook26.6k2.9k
meta-llama/llama-models
Utilities intended for use with Llama models.
Language:Python3.8k681
sarulab-speech/UTMOSv2
UTokyo-SaruLab MOS Prediction System
Language:Python485
lucidrains/autoregressive-diffusion-pytorch
Implementation of Autoregressive Diffusion in Pytorch
Language:Python2463
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Language:Python1.2k46
AudioLLMs/AudioBench
AudioBench: A Universal Benchmark for Audio Large Language Models
Language:Python611