farlit's Stars
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
isl-org/DPT
Dense Prediction Transformers
isl-org/lang-seg
Language-Driven Semantic Segmentation
lucidrains/magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
clip-vil/CLIP-ViL
[ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383
UMass-Foundation-Model/3D-VLA
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
zdou0830/METER
METER: A Multimodal End-to-end TransformER Framework
chrischoy/SpatioTemporalSegmentation
4D Spatio-Temporal Semantic Segmentation on a 3D video (a sequence of 3D scans)
feizc/DiS
Scalable Diffusion Models with State Space Backbone
airsplay/R2R-EnvDrop
PyTorch Code of NAACL 2019 paper "Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout"
amazon-science/alexa-arena
jialuli-luka/PanoGen
Code and Data for Paper: PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation
linyq2117/TagCLIP
MrZihan/HNR-VLN
Official implementation of Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation (CVPR'24 Highlight).
whcpumpkin/Demand-driven-navigation
Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven Navigation
GuoPingPan/Habitat-Sim-Usage-Chinese
这个文档是使用Habitat-sim的中文教程
expectorlin/NavCoT
Code of the paper "NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning"
Gabesarch/HELPER
ydhongHIT/PlainSeg
simple and efficient baselines for practical semantic segmentation with plain ViTs
joeyy5588/planning-as-inpainting
Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty
jasonseu/SALGL
An official codebase of Scene-Aware Label Graph Learning for Multi-Label Image Classification, ICCV 2023.
expectorlin/DR-Attacker
code for the paper "Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation" (TPAMI 2021)
expectorlin/ADAPT
code for the paper "ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts" (CVPR 2022)
intelligolabs/R2RIE-CE
Official repository of "Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation". We present the first dataset - R2R-IE-CE - to benchmark instructions errors in VLN. We then propose a method, IEDL.
weixi-feng/ULN
Code and data for EMNLP 2022 paper "ULN: Towards Underspecified Vision-and-Language Navigation"
RavenKiller/TAC
farlit/ACDS
Code for ICASSP2023: Ancient Chinese Word Segmentation and Part-of-Speech Tagging Using Distant Supervision (Word Alignment)
PKU-EPIC/NaVid
HLR/NavHint
[EACL 2024] PyTorch code of NavHint: Vision and Language Navigation Agent with a Hint Generator
lingjunzhao/coop_instruction