LinMu7177's Stars
Meituan-AutoML/MobileVLM
Strong and Open Vision Language Assistant for Mobile Devices
DaiShiResearch/TransNeXt
[CVPR 2024] Code release for TransNeXt model
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
snap-research/Panda-70M
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
NUS-HPC-AI-Lab/Neural-Network-Parameter-Diffusion
We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters
LargeWorldModel/LWM
google-research/syn-rep-learn
Learning from synthetic data - code and models
lxtGH/OMG-Seg
OMG-LLaVA and OMG-Seg codebase
UX-Decoder/FIND
facebookresearch/dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
jacobgil/pytorch-grad-cam
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
casper9429-kth/Siamese-Masked-Autoencoders---Learning-and-Exploration
Course: DD2412 Deep Learning Advanced at KTH Project by Casper, Magnus, and Friso Focus: Self-supervised learning and computer vision with SiamMAE. Replicating core results and potential research extensions.
SHI-Labs/VCoder
VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023 / CVPR 2024
lzw-lzw/GroundingGPT
[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model
PKU-YuanGroup/LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
CircleRadon/Osprey
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
poloclub/cnn-explainer
Learning Convolutional Neural Networks with Interactive Visualization.
yumingj/Text2Human
Code for Text2Human (SIGGRAPH 2022). Paper: Text2Human: Text-Driven Controllable Human Image Generation
zalandoresearch/pytorch-vq-vae
PyTorch implementation of VQ-VAE by Aäron van den Oord et al.
FutureXiang/soda
Unofficial implementation of "SODA: Bottleneck Diffusion Models for Representation Learning"
InternLM/InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
rosinality/vq-vae-2-pytorch
Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
eric-ai-lab/MiniGPT-5
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
X2FD/LVIS-INSTRUCT4V
openai/consistencydecoder
Consistency Distilled Diff VAE
CompVis/stable-diffusion
A latent text-to-image diffusion model
fudan-zvg/Semantic-Segment-Anything
Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
UX-Decoder/Segment-Everything-Everywhere-All-At-Once
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
microsoft/X-Decoder
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language