ZhuangPeiyu's Stars
ucaslcl/Fox
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
quqxui/Awesome-LLM4IE-Papers
Awesome papers about generative Information Extraction (IE) using Large Language Models (LLMs)
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
rshaojimmy/MultiModal-DeepFake
[TPAMI 2024 & CVPR 2023] PyTorch code for DGM4: Detecting and Grounding Multi-Modal Media Manipulation and beyond
scu-zjz/IMDLBenCo
A comprehensive benchmark & codebase for Image manipulation detection/localization.
qcf-568/MIML
[CVPR2024] Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods
OpenGVLab/MM-NIAH
This is the official implementation of the paper "Needle In A Multimodal Haystack"
greatzh/Papers
Image Forgery Detection and Localization (and related) Papers List
Ekko-zn/AIGCDetectBenchmark
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Coobiw/MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
LianjiaTech/BELLE
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
OSU-NLP-Group/TableLlama
[NAACL'24] Dataset, code and models for "TableLlama: Towards Open Large Generalist Models for Tables".
Ucas-HaoranWei/Vary
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
csuhan/OneLLM
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
Alpha-VLLM/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
IDEA-CCNL/Fengshenbang-LM
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
YuchenLiu98/COMM
Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
modelscope/modelscope-agent
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
clpeng/Awesome-Face-Forgery-Generation-and-Detection
A curated list of articles and codes related to face forgery generation and detection.
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
GenImage-Dataset/GenImage
Stability-AI/stablediffusion
High-Resolution Image Synthesis with Latent Diffusion Models
ZhendongWang6/DIRE
[ICCV 2023] Official implementation of the paper: "DIRE for Diffusion-Generated Image Detection"
lllyasviel/ControlNet
Let us control diffusion models!