lcxrocks's Stars
microsoft/LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
MLNLP-World/Paper-Writing-Tips
MLNLP社区用来帮助大家避免论文投稿小错误的整理仓库。 Paper Writing Tips
locuslab/llava-token-compression
x-cls/superclass
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
Stanford-AIMI/RaVL
[NeurIPS 2024] RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models
zejiangh/Semi-FSL
PyTorch implementation of the paper "Semi-Supervised Few-Shot Learning via Dependency Maximization and Instance Discriminant Analysis", available at https://link.springer.com/content/pdf/10.1007/s11265-022-01796-x.pdf
zhuhsingyuu/Frolic
Our Implement for Frolic. More details will be provided later.
thunlp/LLaVA-UHD
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
MCG-NJU/AWT
[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
zhmiao/OpenLongTailRecognition-OLTR
Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)
Imbalance-VLM/Imbalance-VLM
hzwer/WritingAIPaper
Writing AI Conference Papers: A Handbook for Beginners
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
lixinustc/GraphAdapter
The efficient tuning method for VLMs
apachecn/ml-mastery-zh
:book: [译] MachineLearningMastery 博客文章
Huage001/LinFusion
Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"
kongds/E5-V
E5-V: Universal Embeddings with Multimodal Large Language Models
Zeyi-Lin/HivisionIDPhotos
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
YueYANG1996/LaBo
CVPR 2023: Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification
iamxym/Deep-Fourier-based-Arbitrary-scale-Super-resolution-for-Real-time-Rendering
SIGGRAPH 2024 Conference Paper: Deep Fourier-based Arbitrary-scale Super-resolution for Real-time Rendering
bfshi/scaling_on_scales
When do we not need larger vision models?
zengwang430521/TCFormer
The codes for TCFormer in paper: Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
LALBJ/PAI
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Vill-Lab/2024-AAAI-HPT
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)
zhuhsingyuu/SSP
Our Implement for SSP
zhengli97/Awesome-Prompt-Adapter-Learning-for-VLMs
A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.
bytedance/tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
fishaudio/fish-speech
Brand new TTS solution