HOMS-dataset

Robot manipulation task data for offline reinforcement learning (Offline RL) especially for the hierarchical object manipulation system (HOMS)

HOMS is a system for performing multi-tasks procedurally.

source code link: https://github.com/SunInKim/Hierarchical-Obejct-Manipulation-System-HOMS

The dataset is composed of a dataset for high-level policy (task classifier) and a dataset for low-level policy (robot controller).

data link: https://koreaoffice-my.sharepoint.com/:f:/g/personal/jhj0630_korea_edu/EnGP9hFKsbVKmU1s_lQlDIAB2zMaeqnFRuXicRc84Mdcnw?e=klpUEj

Collection method

The dataset is collected using pybullet simulator.

The scripted policy can be found in env.util.task_policy.py

The dataset can be collected by execute the code

python get_rollout.py

Description of the dataset

Each rollout contains 100 episodes of unit task.

High policy (Task classifier)

Structure

rollout
--episode
  --observations
  --actions
  --rewards
  --next_observations
  --terminals
  --possible_actions

Data shape

observations: # goal_image+current_image (6, 240, 240) 
actions: # num_task (11,) 
rewards: # get 1 when possible action is None (1,)
next_observations: # goal_image+current_image (6, 240, 240)
terminals: # get 1 when possible action is None (1,)
possible_actions: # N is num_possible_action  (N,)

Low policy (Robot controller)

Structure

rollout
--episode
  --observations 
    --images
    --robot_state
  --actions 
  --rewards 
  --next_observations 
  --terminals 
  --tasks

Data shape

observations: # current_image (3, 240, 240), robot_state(x,y,z,yaw,grippersttate) (6,)
actions: # robot_action (x,y,z,yaw,gripper,task_terminal) (6,)
rewards: # get 1 for completing the task and returning to the initial position.  (1,)
next_observations: # current_image (3, 240, 240), robot_state (6,), task_id (10,)
terminals: # get done for completing the task and returning to the initial position.  (1,)
tasks: # one-hot vector which indicates the type of task (10,)

suninkim/HOMS-dataset

HOMS-dataset

Collection method

Description of the dataset

High policy (Task classifier)

Structure

Data shape

Low policy (Robot controller)

Structure

Data shape

Examples of transitions of each module to reach the goal state

Goal state

Transition for task classifier

Transition for robot controller