LuoXinjiee's Stars
CrystalSixone/DSRG
Code for A Dual Semantic-Aware Recurrent Global-Adaptive Network For Vision-and-Language Navigation
YicongHong/Recurrent-VLN-BERT
Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation
CircleRadon/TokenPacker
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
agnJason/PianoMotion10M
Code release for PianoMotion10M
cshizhe/VLN-DUET
Official implementation of Think Global, Act Local: Dual-scale GraphTransformer for Vision-and-Language Navigation (CVPR'22 Oral).
songw-zju/HASSC
The official implementation of "Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation" (CVPR 2024)
peteanderson80/Matterport3DSimulator
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
sinahmr/NACLIP
PyTorch Implementation of NACLIP in "Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation"
ctjacobs/sudoku-genetic-algorithm
Solves a Sudoku puzzle using a genetic algorithm.
LiWentomng/gradio-osprey-demo
Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.
xiaolul2/MGMap
[CVPR2024] The code for "MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction"
mhamilton723/FeatUp
Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024
bytedance/fc-clip
[NeurIPS 2023] This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
facebookresearch/ConvNeXt
Code release for ConvNeXt model
mlfoundations/open_clip
An open source implementation of CLIP.
linyq2117/TagCLIP
Qinying-Liu/Awesome-Open-Vocabulary-Semantic-Segmentation
A curated publication list on open vocabulary semantic segmentation and related area (e.g. zero-shot semantic segmentation) resources..
open-mmlab/mmsegmentation
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
wkentaro/labelme
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
wangf3014/SCLIP
Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
xmed-lab/CLIP_Surgery
CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
wysoczanska/clip_dinoiser
Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.
CircleRadon/Osprey
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
facebookresearch/dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
chongzhou96/MaskCLIP
Official PyTorch implementation of "Extract Free Dense Labels from CLIP" (ECCV 22 Oral)
PVIT-official/PVIT
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
inuwamobarak/KOSMOS-2
KOSMOS-2 is designed to handle text and images simultaneously, and redefine the way we perceive and interact with multimodal data, KOSMOS-2 is built on a Transformer-based causal language model architecture, similar to other renowned models like LLaMa-2 and Mistral AI's 7b model.
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
lllyasviel/ControlNet
Let us control diffusion models!
shikras/shikra