Zth9730's Stars
2noise/ChatTTS
A generative speech model for daily dialogue.
meta-llama/llama3
The official Meta Llama 3 GitHub site
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
kyutai-labs/moshi
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
qjfoidnh/BaiduPCS-Go
iikira/BaiduPCS-Go原版基础上集成了分享链接/秒传链接转存功能
google-research/big_vision
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
fixie-ai/ultravox
A fast multimodal LLM for real-time voice
VITA-MLLM/VITA
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
bigscience-workshop/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
yyyujintang/Awesome-Mamba-Papers
Awesome Papers related to Mamba.
multimodal-art-projection/MAP-NEO
X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
yangdongchao/AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
liguodongiot/llm-resource
LLM全栈优质资源汇总
facebookresearch/speech-resynthesis
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
FacePerceiver/FaRL
FaRL for Facial Representation Learning [Official, CVPR 2022]
showlab/videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
voidful/Codec-SUPERB
Audio Codec Speech processing Universal PERformance Benchmark
xingchensong/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
mct10/RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
Takaaki-Saeki/DiscreteSpeechMetrics
Reference-aware automatic speech evaluation toolkit
ylacombe/finetune-hf-vits
Finetune VITS and MMS using HuggingFace's tools
young-geng/scalax
A simple library for scaling up JAX programs
yangdongchao/RSTnet
Real-time Speech-Text Foundation Model Toolkit (wip)
asappresearch/wav2seq
Official code for Wav2Seq
XiaoMi/dasheng
Official PyTorch code for Deep Audio-Signal Holistic Embeddings
my-yy/vfal_papers
Voice Face Association Learning Paper List