banxiyan

banxiyan's Stars

facebookresearch/dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
Language:Jupyter Notebook9k 95 395790
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language:Python4.4k 35 330456
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
Language:Python3.4k 30 7801.1k
X-PLUG/mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Language:Python2.3k 30 229172
google-research/robotics_transformer
Language:Python1.3k 25 25152
facebookresearch/home-robot
Mobile manipulation research tools for roboticists
Language:Python895 30 161124
OpenDriveLab/DriveLM
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
Language:HTML823 19 8752
google-deepmind/open_x_embodiment
Language:Jupyter Notebook809 18 7356
Yangyi-Chen/Multimodal-AND-Large-Language-Models
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
521 20 432
luogen1996/LaVIN
[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
Language:Python502 6 4137
HaoMood/bilinear-cnn
PyTorch implementation of bilinear CNN for fine-grained image recognition
Language:Python390 6 2186
ShirAmir/dino-vit-features
Official implementation for the paper "Deep ViT Features as Dense Visual Descriptors".
Language:Python383 4 2144
OrdnanceSurvey/GeoDataViz-Toolkit
The GeoDataViz Toolkit is a set of resources that will help you communicate your data effectively through the design of compelling visuals. In this repository we are sharing resources, assets and other useful links.
377 35 660
Zoeyyao27/CoT-Igniting-Agent
This repository contains the paper list for the paper: Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
333 5 026
jun0wanan/awesome-large-multimodal-agents
312 5 219
ZrrSkywalker/Point-M2AE
[NeurIPS 2022] Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
Language:Python201 11 1723
kyegomez/Vit-RGTS
Open source implementation of "Vision Transformers Need Registers"
Language:Python136 5 613
HKUST-LongGroup/Awesome-Open-Vocabulary-Detection-and-Segmentation
Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
101 2 06
jxbbb/TOD3Cap
[ECCV 2024] TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Language:Python100 5 75
r-three/phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
Language:Python78 0 54
jnhwkim/cbp
Multimodal Compact Bilinear Pooling for Torch7
Language:Lua68 9 923
UCDvision/NOLA
Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"
Language:Python47 3 12
GaoShuang98/DINO-Mix
DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing
Language:Python42 3 31
Shubodh/lidar-image-pretrain-VPR
LiDAR Image Pretraining for Visual Place Recognition
Language:Python28 2 31
ghm0819/ERPoT
Effective and Reliable Pose Tracking for Mobile Robots Based on Lightweight and Compact Polygon Maps
Language:C++27
IemProg/MiMi
🔥 🔥 [WACV2024] Mini but Mighty: Finetuning ViTs with Mini Adapters
Language:Python17 4 00
savadikarc/gift
GIFT: Generative Interpretable Fine-Tuning
Language:Python17 3 22
sijieaaa/DistilVPR
(AAAI 2024) DistilVPR: Cross-Modal Knowledge Distillation for Visual Place Recognition
Language:Python17 1 01
mingzeG/Moment-Probing
A much powerful probing method to tune your model with promising performance and linear probing training cost!
Language:Python15 1 10
mingzeG/DropCov
Implementation of DropCov as described in DropCov: A Simple yet Effective Method for Improving Deep Architectures
Language:Python10