jishengpeng's Stars
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
kyutai-labs/moshi
median-research-group/LibMTL
A PyTorch Library for Multi-Task Learning
YuanGongND/ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
meta-llama/llama3
The official Meta Llama 3 GitHub site
Alpha-VLLM/Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
MorenoLaQuatra/ARCH
ARCH: Audio Representations benCHmark
CrossmodalGroup/DynamicVectorQuantization
Official Pytorch Implementation of Our CVPR2023 Paper: "Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization"
black-forest-labs/flux
Official inference repo for FLUX.1 models
chenpk00/IS2024_stream_decoder_only_asr
OpenT2S/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
LTH14/mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
mct10/RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
libsndfile/libsndfile
A C library for reading and writing sound files containing sampled audio data.
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
FunAudioLLM/FunAudioLLM-APP
Orange-OpenSource/Cool-Chic
Low-complexity neural image & video codec.
tarepan/SpeechMOS
Easy-to-Use Speech MOS predictors
bytedance/1d-tokenizer
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
rese1f/Awesome-VQVAE
A collection of resources and papers on Vector Quantized Variational Autoencoder (VQ-VAE) and its application
FoundationVision/OmniTokenizer
OmniTokenizer: one model and one weight for image-video joint tokenization.
TencentARC/Open-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
fayuge/CLAQ
Code for paper CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs