xiangliu886

xiangliu886's Stars

state-spaces/mamba
Mamba SSM architecture
Language:Python12.7k 101 5091.1k
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Language:Python11.3k 160 3001k
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook11.1k 64 256934
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language:Jupyter Notebook9.7k 97 653949
lucidrains/denoising-diffusion-pytorch
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
Language:Python8k 33 2921k
timothybrooks/instruct-pix2pix
Language:Python6.3k 70 121530
v2fly/fhs-install-v2ray
Bash script for installing V2Ray in operating systems such as Debian / CentOS / Fedora / openSUSE that support systemd
Language:Shell6.1k 98 1881.4k
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Language:Python6k 45 80535
OpenRunner/clash-freenode
订阅地址🚀 免费共享♻️ 定期更新✨ 科学上网🌈 请勿滥用🚫一键订阅📪SSR/CLASH/V2RAY
Language:Python4.5k 95 0392
FoundationVision/VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Language:Python4k 116 81301
Alpha-VLLM/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
Language:Python2k 31 8485
neulab/prompt2model
prompt2model - Generate Deployable Models from Natural Language Instructions
Language:Python1.9k 25 168171
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Language:Python1.8k 26 46108
Picsart-AI-Research/StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Language:Python1.4k 42 56141
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Language:Python1.2k 21 5448
hymie122/RAG-Survey
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
1.2k 24 383
TencentQQGYLab/ELLA
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Language:Python1.1k 42 4354
chuanyangjin/fast-DiT
Fast Diffusion Models with Transformers
Language:Python676 7 1491
GAIR-NLP/anole
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
Language:Python650 8 3936
jy0205/LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
Language:Jupyter Notebook500 15 3526
Alpha-VLLM/Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
Language:Python468 6 2419
microsoft/DCVC
Deep Contextual Video Compression
Language:Python383 12 4660
UnicomAI/Unichat-llama3-Chinese
Language:Python341 9 1134
ShihaoZhaoZSH/LaVi-Bridge
[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
Language:Python307 16 1620
Jiawei-Yang/Denoising-ViT
This is the official code release for our work, Denoising Vision Transformers.
Language:Python282 17 108
lixirui142/VidToMe
Official Pytorch Implementation for "VidToMe: Video Token Merging for Zero-Shot Video Editing" (CVPR 2024)
Language:Python162 9 49
zjysteven/lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, qwen-vl, phi3-v etc.
Language:Python133 4 2811
mutonix/Vript
Language:Python113 1 83
litwellchi/M2Chat
Language:Python30 4 31
TiankaiHang/blog
For self learning
3 1 70