zzhanghub's Stars
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
graphic-design-ai/graphist
Official Repo of Graphist
clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
FreedomIntelligence/ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
h-zhao1997/cobra
Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
X2FD/LVIS-INSTRUCT4V
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
kakaobrain/coyo-dataset
COYO-700M: Large-scale Image-Text Pair Dataset
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
apple/ml-mgie
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
BAAI-DCAI/DataOptim
A collection of visual instruction tuning datasets.
xyproto/png2svg
:twisted_rightwards_arrows: Convert small PNG images to SVG Tiny 1.2
ximinng/PyTorch-SVGRender
SVG Differentiable Rendering: Generating vector graphics using neural networks. Support: text-to-SVG, Image-to-SVG, SVG Editing.
tsujuifu/pytorch_mgie
A Gradio demo of MGIE
Kiteretsu77/APISR
APISR: Anime Production Inspired Real-World Anime Super-Resolution (CVPR 2024)
sail-sg/MDT
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
deepseek-ai/DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
TencentQQGYLab/ELLA
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
OpenGVLab/MM-Interleaved
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
TencentARC/PhotoMaker
PhotoMaker [CVPR 2024]
lllyasviel/sd-forge-layerdiffuse
[WIP] Layer Diffusion for WebUI (via Forge)
lllyasviel/LayerDiffuse
Transparent Image Layer Diffusion using Latent Transparency
XPixelGroup/HAT
CVPR2023 - Activating More Pixels in Image Super-Resolution Transformer Arxiv - HAT: Hybrid Attention Transformer for Image Restoration
yangcaoai/CoDA_NeurIPS2023
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
uchidalab/book-dataset
This dataset contains 207,572 books from the Amazon.com, Inc. marketplace.
xdxie/WordArt
The official code of CornerTransformer (ECCV 2022, Oral) on top of MMOCR.
llava-rlhf/LLaVA-RLHF
Aligning LMMs with Factually Augmented RLHF
vlf-silkie/VLFeedback