youngsheen's Stars
youngsheen/GPST
[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer
souzatharsis/podcastfy
An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI
facebookresearch/lingua
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
youngsheen/SimVQ
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
facebookresearch/spiritlm
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
DAMO-NLP-SG/DiGIT
[NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
kyutai-labs/moshi
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
homebrewltd/ichigo
Local realtime voice AI
triton-lang/triton
Development repository for the Triton language and compiler
pytorch/torchtitan
A native PyTorch Library for large model training
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Haoqiu-Yan/PerceptiveAgent
Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))
TencentARC/Open-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
minyoungg/platonic-rep
openai/tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
pytorch/torchtune
PyTorch native finetuning library
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
facebookresearch/jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
dropreg/efficient_alpaca
The aim of this repository is to utilize LLaMA to reproduce and enhance the Stanford Alpaca
google-deepmind/alphageometry
haoliuhl/language-quantized-autoencoders
Language Quantized AutoEncoders
ytongbai/LVM
ml-explore/mlx
MLX: An array framework for Apple silicon
atong01/conditional-flow-matching
TorchCFM: a Conditional Flow Matching library
The-Run-Philosophy-Organization/run
润学全球官方指定GITHUB,整理润学宗旨、纲领、理论和各类润之实例;解决为什么润,润去哪里,怎么润三大问题; 并成为新**人的核心宗教,核心信念。
kakaobrain/rq-vae-transformer
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
tickstep/aliyunpan
阿里云盘命令行客户端,支持JavaScript插件,支持同步备份功能。
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.