zhiyuanyou's Stars
Stability-AI/stablediffusion
High-Resolution Image Synthesis with Latent Diffusion Models
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
ShiArthur03/ShiArthur03
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
richzhang/PerceptualSimilarity
LPIPS metric. pip install lpips
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
dvlab-research/ControlNeXt
Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA
lucidrains/transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
AILab-CVC/SEED
Official implementation of SEED-LLaMA (ICLR 2024).
xmed-lab/CLIP_Surgery
CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
Q-Future/Q-Align
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
yuweihao/MM-Vet
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
zhangzjn/ADer
ADer (https://arxiv.org/abs/2406.03262) is an open source visual anomaly detection toolbox based on PyTorch, which supports multiple popular AD datasets and approaches.
zwx8981/LIQE
[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
OpenImagingLab/HDRFlow
[CVPR 2024] Real-Time HDR Video Reconstruction
google-research-datasets/richhf-18k
RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with the file name of the associated labeled images (no urls or images are included in this dataset).
kamwoh/partcraft
[ECCV2024] PartCraft: Crafting Creative Objects by Parts
HaomingCai/PIPAL-dataset
[ Official ] - PIPAL Dataset and Training Codebase. ECCV-2020, NTIRE-21/22.
lcysyzxdxc/AGIQA-3k-Database
[IEEE TCSVT2023] A Fine-grained Subjective Perception & Alignment Database for AI Generated Image Quality Assessment
OpenImagingLab/emm
[ECCV2024] Event-Based Motion Magnification
lingli1996/GeoReasoner
[ICML 2024] GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Mode
SUPER-TADORY/1p-frac
OpenImagingLab/DualDn
[ECCV2024] DualDn: Dual-domain Denoising via Differentiable ISP. :star::triumph::star::skull: SOTA model for real-image denoising. :skull: Both in terms of denoising performance and generalization ability.
Kaiwen-Zhu/AgenticIR
An Intelligent Agentic System for Complex Image Restoration Problems
OpenImagingLab/LenslessFace
LenslessFace : An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification
huTao1030/DiffHDR-pytorch
This is the official PyTorch implementation for DiffHDR: Towards High-quality HDR Deghosting with Conditional Diffusion Models (TCSVT'2023)
OpenImagingLab/PhoCoLens
PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging
PhoCoLens/PhoCoLens.github.io
zhiyuanyou/DepictQA
DepictQA: Depicted Image Quality Assessment with Vision Language Models