RollingWang's Stars
feizc/FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
mlfoundations/open_clip
An open source implementation of CLIP.
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
sunanhe/MKT
Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".
feizc/Diffusion-RWKV
Scaling RWKV-Like Architectures for Diffusion Models
yformer/EfficientSAM
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Q-Future/Q-Align
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
zwx8981/LIQE
[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
cloneofsimo/lora
Using Low-rank adaptation to quickly fine-tune diffusion models.
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
lllyasviel/ControlNet
Let us control diffusion models!
CompVis/latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
tgxs002/HPSv2
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
stanford-crfm/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
yuvalkirstain/PickScore
tgxs002/align_sd
Better Aligning Text-to-Image Models with Human Preference. ICCV 2023
THUDM/ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
xinyu1205/recognize-anything
Open-source and strong foundation image recognition models.
AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion web UI
alibaba/MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
UX-Decoder/Segment-Everything-Everywhere-All-At-Once
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
Eurus-Holmes/Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
willard-yuan/awesome-cbir-papers
📝Awesome and classical image retrieval papers
amusi/awesome-ai-awesomeness
A curated list of awesome awesomeness about artificial intelligence
amusi/CV-Company-List
**提供计算机视觉(CV)算法岗位的公司名单,欢迎大家提交issues进行补充
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
woshidandan/Image-Aesthetics-and-Quality-Assessment
[ACMMM 2023, Official Code] for paper "EAT: An Enhancer for Aesthetics-Oriented Transformers". Official Weights and Demos provided. 目前是地表最强开源美学评估模型之一.
DarrenPan/Awesome-CVPR2024-Low-Level-Vision
A Collection of Papers and Codes in CVPR2023/2022 about low level vision