YuehaoYin's Stars
devendrachaplot/Object-Goal-Navigation
Pytorch code for NeurIPS-20 Paper "Object Goal Navigation using Goal-Oriented Semantic Exploration"
facebookresearch/home-robot
Mobile manipulation research tools for roboticists
rllab-snu/TopologicalSemanticGraphMemory
Topological Semantic Graph Memory for Image Goal Navigation (CoRL 2022 oral)
jdf-prog/LLM-Engines
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
BAAI-Agents/Cradle
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with lmms-eval
facebookresearch/SlowFast
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
apple/ml-slowfast-llava
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
LLaVA-VL/LLaVA-NeXT
google-research-datasets/RxR
Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telugu, and 126k navigation following demonstrations. Both annotation types include dense spatiotemporal alignments between the text and the visual perceptions of the annotators
StanfordVL/GibsonEnv
Gibson Environments: Real-World Perception for Embodied Agents
dvlab-research/LLaMA-VID
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
jzhzhang/3DAwareNav
[CVPR 2023] We propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation.
jacobkrantz/Sim2Sim-VLNCE
Official implementation of the ECCV 2022 Oral paper: Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments
jacobkrantz/IVLN-CE
Official Implementation of IVLN-CE: Iterative Vision-and-Language Navigation in Continuous Environments
MarSaKi/ETPNav
[TPAMI 2024] Official repo of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"
YicongHong/Discrete-Continuous-VLN
Code and Data of the CVPR 2022 paper: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
intelligolabs/R2RIE-CE
Official repository of "Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation". We present the first dataset - R2R-IE-CE - to benchmark instructions errors in VLN. We then propose a method, IEDL.
eric-ai-lab/awesome-vision-language-navigation
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
YicongHong/Recurrent-VLN-BERT
Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation
jacobkrantz/VLN-CE
Vision-and-Language Navigation in Continuous Environments using Habitat
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
openvla/openvla
OpenVLA: An open-source vision-language-action model for robotic manipulation.
cshizhe/VLN-DUET
Official implementation of Think Global, Act Local: Dual-scale GraphTransformer for Vision-and-Language Navigation (CVPR'22 Oral).
peteanderson80/Matterport3DSimulator
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
zd11024/NaviLLM
[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'
allenai/spoc-robot-training
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.