/DOZE-Dataset

DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

Primary LanguagePython

DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

arXiv

This work has been accepted by IEEE Robotics and Automation Letters (RA-L).

Table of contents

About

We propose a Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE) that comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios. Specifically, DOZE scenes feature multiple moving humanoid obstacles, a wide array of open-vocabulary objects, diverse distinct-attribute objects, and valuable textual hints. Besides, different from existing datasets that only provide collision checking between the agent and static obstacles, we enhance DOZE by integrating capabilities for detecting collisions between the agent and moving obstacles.

Download DOZE dataset

You can download the dataset on 😴DOZE🐱

Download openxlab

# Install openxlab
pip install  openxlab 
# Upgrade openxlab
pip install -U openxlab 
# Login
openxlab login #Log in and enter the corresponding AK/SK

Download DOZE

# Dataset download
openxlab dataset get --dataset-repo JiMa25/DOZE

Decompress DOZE

The scenes_static.tar.gz file is a static human obstacle scenes package, the scenes_dynamic_fixed.tar.gz file is a fixed trajectory moving human obstacle scenes package, the scenes_dynamic_random.tar.gz file is a random trajectory moving human obstacle scenes package. The episodes.tar.gz file is the data package for multiple navigation tasks.

# Dataset decompress
tar -xzvf episodes.tar.gz
mkdir scenes
cd scenes
tar -xzvf ../scenes_static.tar.gz
tar -xzvf ../scenes_dynamic_fixed.tar.gz
tar -xzvf ../scenes_dynamic_random.tar.gz

Filesystem Hierarchy

The final hierarchy should look as follows:

~/DOZE
  β”œβ”€β”€ episodes
  β”‚     β”œβ”€β”€ Appearance
  |     |     β”œβ”€β”€ DOZE_0.json
  |     |     β”œβ”€β”€ DOZE_0.json.gz
  |     |     β”œβ”€β”€ DOZE_1.json
  |     |     β”œβ”€β”€ DOZE_1.json.gz
  |     |     β”œβ”€β”€ ...
  |     |     β”œβ”€β”€ DOZE_9.json
  |     |     └── DOZE_9.json.gz
  β”‚     β”œβ”€β”€ Hint
  |     |     β”œβ”€β”€ DOZE_0.json
  |     |     β”œβ”€β”€ DOZE_0.json.gz
  |     |     β”œβ”€β”€ DOZE_1.json
  |     |     β”œβ”€β”€ DOZE_1.json.gz
  |     |     β”œβ”€β”€ ...
  |     |     β”œβ”€β”€ DOZE_9.json
  |     |     └── DOZE_9.json.gz
  β”‚     β”œβ”€β”€ OV
  |     |     β”œβ”€β”€ DOZE_0.json
  |     |     β”œβ”€β”€ DOZE_0.json.gz
  |     |     β”œβ”€β”€ DOZE_1.json
  |     |     β”œβ”€β”€ DOZE_1.json.gz
  |     |     β”œβ”€β”€ ...
  |     |     β”œβ”€β”€ DOZE_9.json
  |     |     └── DOZE_9.json.gz
  β”‚     └── Spacial
  |           β”œβ”€β”€ DOZE_0.json
  |           β”œβ”€β”€ DOZE_0.json.gz
  |           β”œβ”€β”€ DOZE_1.json
  |           β”œβ”€β”€ DOZE_1.json.gz
  |           β”œβ”€β”€ ...
  |           β”œβ”€β”€ DOZE_9.json
  |           └── DOZE_9.json.gz
  └──scenes
       β”œβ”€β”€ dynamic_fixed
       β”‚     β”œβ”€β”€ DOZE_dynamic_fixed_0_Data
       β”‚     β”œβ”€β”€ DOZE_dynamic_fixed_1_Data
       β”‚     β”œβ”€β”€ ...
       |     β”œβ”€β”€ DOZE_dynamic_fixed_9_Data
       |     β”œβ”€β”€ DOZE_dynamic_fixed_0.x86_64
       |     β”œβ”€β”€ DOZE_dynamic_fixed_1.x86_64
       |     β”œβ”€β”€ ...
       |     β”œβ”€β”€ DOZE_dynamic_fixed_9.x86_64
       |     β”œβ”€β”€ UnityPlayer.so
       |     └── UnityPlayer_s.debug
       β”œβ”€β”€ dynamic_random
       β”‚     β”œβ”€β”€ DOZE_dynamic_random_0_Data
       β”‚     β”œβ”€β”€ DOZE_dynamic_random_1_Data
       β”‚     β”œβ”€β”€ ...
       |     β”œβ”€β”€ DOZE_dynamic_random_9_Data
       |     β”œβ”€β”€ DOZE_dynamic_random_0.x86_64
       |     β”œβ”€β”€ DOZE_dynamic_random_1.x86_64
       |     β”œβ”€β”€ ...
       |     β”œβ”€β”€ DOZE_dynamic_random_9.x86_64
       |     β”œβ”€β”€ UnityPlayer.so
       |     └── UnityPlayer_s.debug
       └── static
            β”œβ”€β”€ DOZE_dynamic_random_0_Data
            β”œβ”€β”€ DOZE_dynamic_random_1_Data
            β”œβ”€β”€ ... 
            β”œβ”€β”€ DOZE_dynamic_random_9_Data
            β”œβ”€β”€ DOZE_dynamic_random_0.x86_64
            β”œβ”€β”€ DOZE_dynamic_random_1.x86_64
            β”œβ”€β”€ ...
            β”œβ”€β”€ DOZE_dynamic_random_9.x86_64
            β”œβ”€β”€ UnityPlayer.so
            └── UnityPlayer_s.debug

The episodes folder contains four navigation tasks: Appearance, Spacial, OV (Open-Vocabulary), and Hint. The static folder contains 10 3d scenes with static humanoid obstacles, the dynamic_fixed folder contains 10 3d scenes with fixed trajectories moving humanoid obstacles, and the dynamic_random folder contains 10 3d scenes with random trajectories moving humanoid obstacles. In these scene folders, DOZE_xxxxxx.x86_64 is the executable file.

Episode Structure

Here is an example of the structure of a single episode in our data set.

{
    "id": "Appearance_DOZE_0_274",
    "scene": "DOZE_0",
    "initial_horizon": 10,
    "initial_orientation": 90,
    "initial_position": {
        "x": 0.625999987,
        "y": 0.9,
        "z": 3.20000005
    },
    "goal_object": "a yellow wateringcan",
    "shortest_path": [
        {
            "x": 0.6259999871253967,
            "y": 0.9,
            "z": 3.200000047683716
        },
        {
            "x": -0.14999985694885254,
            "y": 0.9,
            "z": 2.549999952316284
        },
        {
            "x": -1.5299997329711914,
            "y": 0.9,
            "z": -1.499999761581421
        },
        {
            "x": -1.6199997663497925,
            "y": 0.9,
            "z": -1.5899999141693115
        },
        {
            "x": -1.874000072479248,
            "y": 0.9,
            "z": -1.7999999523162842
        }
    ],
    "shortest_path_length": 5.747767802180204
}

A DOZE_x.json file contains all the tasks in a DOZE_x scene. Key parameters include:

  • id: the index of the task.
  • scene: the scene for this task.
  • initial_horizon: the horizon of the agent's initial state. the horizon change's the camera's rotation. Values are clamped between [-30:30].
  • initial_orientation: The initial rotation of the agent.
  • initial_position: initial position of the agent.
  • goal_object: target object.
  • shortest_path: a dictionary containing the shortest paths from the starting point to the neighborhood of the target object.
  • shortest_path_length: shortest path length from source to that target.

Quick Start

Environment Setup

To set up the environment, follow these steps:

pip install -r requirements.txt

Start example

cd scripts
python example.py

You can see the program running in the following window:

Watch the video

Dynamic Object

A rolling basketball

Examples of Experimental Results

C-L3MVN demo

Image from left to right: The image the agent sees, the image used in BGR format, the map established in navigation

Success & Failure Examples

Success Examples

Failure Examples

The visual model incorrectly identified the target object.

The agent is stuck in the scene.

Visual models are inadequate for item description recognition.

The agent reaches the maximum step limit. Procedure

User Feedback

WeChat Group

Feel free to contact us if you have any questions about this dataset, and you are welcome to join our users' WeChat group!

WeChat Group

Discord Group

Join our Discord community for discussions and support: https://discord.gg/x2wP4vz8

BibTeX

@article{ma2024doze,
      title={DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments},
      author={Ma, Ji and Dai, Hongming and Mu, Yao and Wu, Pengying and Wang, Hao and Chi, Xiaowei and Fei, Yang and Zhang, Shanghang and Liu, Chang},
      journal={arXiv preprint arXiv:2402.19007},
      year={2024}
}