jishengpeng

speech（text-to-speech, codec, speech language model）

zhejiang universitynantong

jishengpeng's Stars

hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
Language:Python31.9k3.9k
kyutai-labs/moshi
Language:Python6k445
median-research-group/LibMTL
A PyTorch Library for Multi-Task Learning
Language:Python2k182
YuanGongND/ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
Language:Python36733
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language:Python2.1k126
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Language:Python88239
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python2.7k253
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Language:Python68539
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Language:Python3.2k335
meta-llama/llama3
The official Meta Llama 3 GitHub site
Language:Python26.5k3k
Alpha-VLLM/Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
Language:Python47319
MorenoLaQuatra/ARCH
ARCH: Audio Representations benCHmark
Language:Python262
CrossmodalGroup/DynamicVectorQuantization
Official Pytorch Implementation of Our CVPR2023 Paper: "Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization"
Language:Python1516
black-forest-labs/flux
Official inference repo for FLUX.1 models
Language:Python14.4k1k
chenpk00/IS2024_stream_decoder_only_asr
82
OpenT2S/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
Language:Python21411
LTH14/mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Language:Python82641
mct10/RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
Language:Python14710
libsndfile/libsndfile
A C library for reading and writing sound files containing sampled audio data.
Language:C1.4k381
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python5.2k539
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Language:Python2.8k267
FunAudioLLM/FunAudioLLM-APP
Language:Python26749
Orange-OpenSource/Cool-Chic
Low-complexity neural image & video codec.
Language:Python955
tarepan/SpeechMOS
Easy-to-Use Speech MOS predictors
Language:Python21516
bytedance/1d-tokenizer
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
Language:Jupyter Notebook40716
rese1f/Awesome-VQVAE
A collection of resources and papers on Vector Quantized Variational Autoencoder (VQ-VAE) and its application
2057
FoundationVision/OmniTokenizer
OmniTokenizer: one model and one weight for image-video joint tokenization.
Language:Python2335
TencentARC/Open-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Language:Python63124
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Language:Python1.2k92
fayuge/CLAQ
Code for paper CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
Language:Python8

jishengpeng

jishengpeng's Stars

hiyouga/LLaMA-Factory

kyutai-labs/moshi

median-research-group/LibMTL

YuanGongND/ltu

ictnlp/LLaMA-Omni

showlab/Show-o

gpt-omni/mini-omni

jishengpeng/WavTokenizer

huggingface/speech-to-speech

meta-llama/llama3

Alpha-VLLM/Lumina-mGPT

MorenoLaQuatra/ARCH

CrossmodalGroup/DynamicVectorQuantization

black-forest-labs/flux

chenpk00/IS2024_stream_decoder_only_asr

OpenT2S/LlamaVoice

LTH14/mar

mct10/RepCodec

libsndfile/libsndfile

FunAudioLLM/CosyVoice

FunAudioLLM/SenseVoice

FunAudioLLM/FunAudioLLM-APP

Orange-OpenSource/Cool-Chic

tarepan/SpeechMOS

bytedance/1d-tokenizer

rese1f/Awesome-VQVAE

FoundationVision/OmniTokenizer

TencentARC/Open-MAGVIT2

aigc-apps/EasyAnimate

fayuge/CLAQ