YuehaoYin

YuehaoYin's Stars

devendrachaplot/Object-Goal-Navigation
Pytorch code for NeurIPS-20 Paper "Object Goal Navigation using Goal-Oriented Semantic Exploration"
Language:Python31859
facebookresearch/home-robot
Mobile manipulation research tools for roboticists
Language:Python899124
rllab-snu/TopologicalSemanticGraphMemory
Topological Semantic Graph Memory for Image Goal Navigation (CoRL 2022 oral)
Language:Python1078
jdf-prog/LLM-Engines
Language:Python263
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python28.8k4.3k
BAAI-Agents/Cradle
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
Language:Python1.8k157
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with lmms-eval
Language:Python1.6k129
facebookresearch/SlowFast
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Language:Python6.6k1.2k
apple/ml-slowfast-llava
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Language:Python15210
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python5.7k449
LLaVA-VL/LLaVA-NeXT
Language:Python2.7k209
google-research-datasets/RxR
Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telugu, and 126k navigation following demonstrations. Both annotation types include dense spatiotemporal alignments between the text and the visual perceptions of the annotators
Language:Python11412
StanfordVL/GibsonEnv
Gibson Environments: Real-World Perception for Embodied Agents
Language:C861146
dvlab-research/LLaMA-VID
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Language:Python71545
jzhzhang/3DAwareNav
[CVPR 2023] We propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation.
Language:Python583
jacobkrantz/Sim2Sim-VLNCE
Official implementation of the ECCV 2022 Oral paper: Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments
Language:Python25
jacobkrantz/IVLN-CE
Official Implementation of IVLN-CE: Iterative Vision-and-Language Navigation in Continuous Environments
Language:Python281
MarSaKi/ETPNav
[TPAMI 2024] Official repo of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"
Language:Python20118
YicongHong/Discrete-Continuous-VLN
Code and Data of the CVPR 2022 paper: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Language:Python888
intelligolabs/R2RIE-CE
Official repository of "Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation". We present the first dataset - R2R-IE-CE - to benchmark instructions errors in VLN. We then propose a method, IEDL.
Language:Python71
eric-ai-lab/awesome-vision-language-navigation
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
37119
YicongHong/Recurrent-VLN-BERT
Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation
Language:Python14930
jacobkrantz/VLN-CE
Vision-and-Language Navigation in Continuous Environments using Habitat
Language:Python27855
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language:Python5.8k453
openvla/openvla
OpenVLA: An open-source vision-language-action model for robotic manipulation.
Language:Python1.2k144
cshizhe/VLN-DUET
Official implementation of Think Global, Act Local: Dual-scale GraphTransformer for Vision-and-Language Navigation (CVPR'22 Oral).
Language:Python1108
peteanderson80/Matterport3DSimulator
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
Language:C++494131
zd11024/NaviLLM
[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'
Language:Python1146
allenai/spoc-robot-training
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Language:Python866
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook11.8k1k