zzhanghub

zzhanghub's Stars

open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Language:Python1.5k219
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language:Python37230
graphic-design-ai/graphist
Official Repo of Graphist
1032
clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Language:Python5.9k479
FreedomIntelligence/ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
Language:Python2509
h-zhao1997/cobra
Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
Language:Python2658
X2FD/LVIS-INSTRUCT4V
132
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Language:Python2k115
kakaobrain/coyo-dataset
COYO-700M: Large-scale Image-Text Pair Dataset
Language:Python1.2k38
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Language:Python3.2k281
apple/ml-mgie
Language:Python3.9k253
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
Language:Python96472
BAAI-DCAI/DataOptim
A collection of visual instruction tuning datasets.
Language:Python763
xyproto/png2svg
:twisted_rightwards_arrows: Convert small PNG images to SVG Tiny 1.2
Language:Go32539
ximinng/PyTorch-SVGRender
SVG Differentiable Rendering: Generating vector graphics using neural networks. Support: text-to-SVG, Image-to-SVG, SVG Editing.
Language:Python1349
tsujuifu/pytorch_mgie
A Gradio demo of MGIE
Language:Python34530
Kiteretsu77/APISR
APISR: Anime Production Inspired Real-World Anime Super-Resolution (CVPR 2024)
Language:Python91661
sail-sg/MDT
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
Language:Python53440
deepseek-ai/DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Language:Python2.2k202
TencentQQGYLab/ELLA
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Language:Python1.1k58
OpenGVLab/MM-Interleaved
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Language:Python20611
TencentARC/PhotoMaker
PhotoMaker [CVPR 2024]
Language:Jupyter Notebook9.7k770
lllyasviel/sd-forge-layerdiffuse
[WIP] Layer Diffusion for WebUI (via Forge)
Language:Python3.9k335
lllyasviel/LayerDiffuse
Transparent Image Layer Diffusion using Latent Transparency
2k28
XPixelGroup/HAT
CVPR2023 - Activating More Pixels in Image Super-Resolution Transformer Arxiv - HAT: Hybrid Attention Transformer for Image Restoration
Language:Python1.3k158
yangcaoai/CoDA_NeurIPS2023
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Language:Jupyter Notebook18616
uchidalab/book-dataset
This dataset contains 207,572 books from the Amazon.com, Inc. marketplace.
Language:Python24453
xdxie/WordArt
The official code of CornerTransformer (ECCV 2022, Oral) on top of MMOCR.
Language:Python13815
llava-rlhf/LLaVA-RLHF
Aligning LMMs with Factually Augmented RLHF
Language:Python33224
vlf-silkie/VLFeedback
Language:Python912