zzhanghub's Stars
binary-husky/gpt_academic
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Doubiiu/ToonCrafter
[SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
kakaobrain/rq-vae-transformer
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
SunzeY/AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
TencentARC/Open-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
bytedance/1d-tokenizer
This repo contains the code for 1D tokenizer and generator
ssundaram21/dreamsim
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual Alignment Benefit Vision Representations? (NeurIPS 2024)
AILab-CVC/SEED-X
Multimodal Models in Real World
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
CasualGANPapers/Make-A-Scene
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
design-edit/DesignEdit
DesignEdit: Unify Spatial-Aware Image Editing via Training-free Inpainting with a Multi-Layered Latent Diffusion Framework
buptlihang/CDLA
CDLA: A Chinese document layout analysis (CDLA) dataset
NJU-PCALab/OpenVid-1M
JosephKJ/Awesome-Layout-Generators
An awesome list of layout generation papers
CrossmodalGroup/DynamicVectorQuantization
Official Pytorch Implementation of Our CVPR2023 Paper: "Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization"
MMStar-Benchmark/MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
YifanXu74/Libra
Simple PyTorch implementation of "Libra: Building Decoupled Vision System on Large Language Models" (accepted by ICML 2024)
OpenGVLab/Hulk
An official implementation of "Hulk: A Universal Knowledge Translator for Human-Centric Tasks"
graphic-design-ai/graphist
Official Repo of Graphist
OpenGVLab/LCL
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
XiaoduoAILab/XmodelVLM
OPPO-Mente-Lab/FaceScore
Official repo for 【FaceScore: Benchmarking and Enhancing Face Quality in Human Generation】
OPPO-Mente-Lab/GlyphDraw2
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
Chriskuei/awesome-generative-retrieval-models
A curated list of awesome papers related to generative retrieval models.
DCDmllm/MorphTokens
graphic-design-ai/awesome-graphic-design
Awesome List for Graphic Design