banxiyan's Stars
facebookresearch/dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
X-PLUG/mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
google-research/robotics_transformer
facebookresearch/home-robot
Mobile manipulation research tools for roboticists
OpenDriveLab/DriveLM
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
google-deepmind/open_x_embodiment
Yangyi-Chen/Multimodal-AND-Large-Language-Models
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
luogen1996/LaVIN
[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
HaoMood/bilinear-cnn
PyTorch implementation of bilinear CNN for fine-grained image recognition
ShirAmir/dino-vit-features
Official implementation for the paper "Deep ViT Features as Dense Visual Descriptors".
OrdnanceSurvey/GeoDataViz-Toolkit
The GeoDataViz Toolkit is a set of resources that will help you communicate your data effectively through the design of compelling visuals. In this repository we are sharing resources, assets and other useful links.
Zoeyyao27/CoT-Igniting-Agent
This repository contains the paper list for the paper: Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
jun0wanan/awesome-large-multimodal-agents
ZrrSkywalker/Point-M2AE
[NeurIPS 2022] Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
kyegomez/Vit-RGTS
Open source implementation of "Vision Transformers Need Registers"
HKUST-LongGroup/Awesome-Open-Vocabulary-Detection-and-Segmentation
Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
jxbbb/TOD3Cap
[ECCV 2024] TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
r-three/phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
jnhwkim/cbp
Multimodal Compact Bilinear Pooling for Torch7
UCDvision/NOLA
Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"
GaoShuang98/DINO-Mix
DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing
Shubodh/lidar-image-pretrain-VPR
LiDAR Image Pretraining for Visual Place Recognition
ghm0819/ERPoT
Effective and Reliable Pose Tracking for Mobile Robots Based on Lightweight and Compact Polygon Maps
IemProg/MiMi
🔥 🔥 [WACV2024] Mini but Mighty: Finetuning ViTs with Mini Adapters
savadikarc/gift
GIFT: Generative Interpretable Fine-Tuning
sijieaaa/DistilVPR
(AAAI 2024) DistilVPR: Cross-Modal Knowledge Distillation for Visual Place Recognition
mingzeG/Moment-Probing
A much powerful probing method to tune your model with promising performance and linear probing training cost!
mingzeG/DropCov
Implementation of DropCov as described in DropCov: A Simple yet Effective Method for Improving Deep Architectures