Kizna1ver's Stars
xinntao/Real-ESRGAN
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
HqWu-HITCS/Awesome-Chinese-LLM
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
stas00/ml-engineering
Machine Learning Engineering Open Book
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
chaiNNer-org/chaiNNer
A node-based image processing GUI aimed at making chaining image processing tasks easy and customizable. Born as an AI upscaling application, chaiNNer has grown into an extremely flexible and powerful programmatic image processing application.
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
jingyi0000/VLM_survey
Collection of AWESOME vision-language models for vision tasks
ChaofWang/Awesome-Super-Resolution
Collect super-resolution related papers, data, repositories
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
apple/ml-aim
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
eric-ai-lab/MiniGPT-5
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
DirtyHarryLYL/LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
CircleRadon/Osprey
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
showlab/Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
wl-zhao/VPD
[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
UCSC-VLAA/CLIPA
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
apple/ml-veclip
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
LightDXY/FT-CLIP
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
hsouri/Battle-of-the-Backbones
XiaoxiaoGuo/fashion-iq
wendashi/Cool-GenAI-Fashion-Papers
🧢🕶️🥼👖👟🧳 A curated list of cool resources about GenAI-Fashion, including 📝papers, 👀workshops, 🚀companies & products, ...
OliverRensu/D-iGPT
[ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Learners"
xiaolul2/MGMap
[CVPR2024] The code for "MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction"
xuewyang/Fashion_Captioning
ECCV2020 paper: Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. Code and Data.
LiWentomng/Point2Mask
The code for "Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport", ICCV2023
CircleRadon/APro
The code for "Label-efficient Segmentation via Affinity Propagation". [NeurIPS2023]
RotsteinNoam/FuseCap
FuseCap: Large Language Model for Visual Data Fusion in Enriched Caption Generation
zijinxuxu/PDFNet
RGB-D fusion for two-hand reconstruction