xiangliu886's Stars
state-spaces/mamba
Mamba SSM architecture
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
lucidrains/denoising-diffusion-pytorch
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
timothybrooks/instruct-pix2pix
v2fly/fhs-install-v2ray
Bash script for installing V2Ray in operating systems such as Debian / CentOS / Fedora / openSUSE that support systemd
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
OpenRunner/clash-freenode
订阅地址🚀 免费共享♻️ 定期更新✨ 科学上网🌈 请勿滥用🚫一键订阅📪SSR/CLASH/V2RAY
FoundationVision/VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Alpha-VLLM/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
neulab/prompt2model
prompt2model - Generate Deployable Models from Natural Language Instructions
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Picsart-AI-Research/StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
hymie122/RAG-Survey
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
TencentQQGYLab/ELLA
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
chuanyangjin/fast-DiT
Fast Diffusion Models with Transformers
GAIR-NLP/anole
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
jy0205/LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
Alpha-VLLM/Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
microsoft/DCVC
Deep Contextual Video Compression
UnicomAI/Unichat-llama3-Chinese
ShihaoZhaoZSH/LaVi-Bridge
[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
Jiawei-Yang/Denoising-ViT
This is the official code release for our work, Denoising Vision Transformers.
lixirui142/VidToMe
Official Pytorch Implementation for "VidToMe: Video Token Merging for Zero-Shot Video Editing" (CVPR 2024)
zjysteven/lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, qwen-vl, phi3-v etc.
mutonix/Vript
litwellchi/M2Chat
TiankaiHang/blog
For self learning