/ObjectFolder

ObjectFolder Dataset

Primary LanguagePythonCreative Commons Attribution 4.0 InternationalCC-BY-4.0

ObjectFolder 2.0

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer (CVPR 2022)


ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
Ruohan Gao*, Zilin Si*, Yen-Yu Chang*, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu
Stanford University, Carnegie Mellon University
In Conference on Computer Vision and Pattern Recognition (CVPR), 2022


If you find our dataset, code or project useful in your research, we appreciate it if you could cite:

@inproceedings{gao2022ObjectFolderV2,
  title = {ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer},
  author = {Gao, Ruohan and Si, Zilin and Chang, Yen-Yu and Clarke, Samuel and Bohg, Jeannette and Fei-Fei, Li and Yuan, Wenzhen and Wu, Jiajun},
  booktitle = {CVPR},
  year = {2022}
}

@inproceedings{gao2021ObjectFolder,
  title = {ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations},
  author = {Gao, Ruohan and Chang, Yen-Yu and Mall, Shivani and Fei-Fei, Li and Wu, Jiajun},
  booktitle = {CoRL},
  year = {2021}
}

About ObjectFolder Dataset

ObjectFolder 2.0 is a dataset of 1,000 objects in the form of implicit representations. It contains 1,000 Object Files each containing the complete multisensory profile for an object instance. Each Object File implicit neural representation network contains three sub-networks---VisionNet, AudioNet, and TouchNet, which through querying with the corresponding extrinsic parameters we can obtain the visual appearance of the object from different views and lighting conditions, impact sounds of the object at each position of specified force profile, and tactile sensing of the object at every surface location for varied rotation angels and pressing depth, respectively. The dataset contains common household objects of diverse categories such as wood desks, ceramic bowls, plastic toys, steel forks, glass mirrors, etc. See the paper for details.


Prerequisites

  • OS: Ubuntu 20.04.2 LTS
  • GPU: >= NVIDIA GTX 1080 Ti with >= 460.73.01 driver
  • Python package manager conda

Setup

git clone https://github.com/rhgao/ObjectFolder.git
cd ObjectFolder
export OBJECTFOLDER_HOME=$PWD
conda env create -f $OBJECTFOLDER_HOME/environment.yml
source activate ObjectFolder-env

Dataset Download and Preparation

Use the following command to download the first 100 objects:

wget https://download.cs.stanford.edu/viscam/ObjectFolder/ObjectFolder1-100.tar.gz
tar -xvf ObjectFolder1-100.tar.gz

Use the following command to download objects 101-1000:

wget https://download.cs.stanford.edu/viscam/ObjectFolder/ObjectFolder101-1000.tar.gz
tar -xvf ObjectFolder101-1000KiloOSF.tar.gz

Use the following command to download ObjectFiles with KiloOSF that supports real-time visual rendering at the expense of larger model size:

wget https://download.cs.stanford.edu/viscam/ObjectFolder/ObjectFolder1-100KiloOSF.tar.gz
wget https://download.cs.stanford.edu/viscam/ObjectFolder/ObjectFolder101-1000KiloOSF.tar.gz
tar -xvf ObjectFolder101-100KiloOSF.tar.gz
tar -xvf ObjectFolder101-1000KiloOSF.tar.gz

Rendering Visual, Auditory, and Tactile Sensory Data

Run the following command to render visual appearance of the object from a specified camera viewpoint and lighting direction:

  $ python OF_render.py --modality vision --object_file_path path_of_ObjectFile \
      --vision_test_file_path path_of_vision_test_file \
      --vision_results_dir path_of_vision_results_directory \

Run the following command to render impact sounds of the object at a specified surface location and impact force:

  $ python OF_render.py --modality audio --object_file_path path_of_ObjectFile \
      --audio_vertices_file_path path_of_audio_testing_vertices_file \
      --audio_forces_file_path path_of_forces_file \
      --audio_results_dir path_of_audio_results_directory \

Run the following command to render tactile RGB iamges of the object at a specified surface location, gel rotation angle, and deformation:

  $ python OF_render.py --modality touch --object_file_path path_of_ObjectFile \
      --touch_vertices_file_path path_of_touch_testing_vertices_file \
      --touch_gelinfo_file_path path_of_gelinfor_file \
      --touch_results_path path_of_touch_results_directory

The command-line arguments are described as follows:

  • --object_file_path: The path of ObjectFile.
  • --vision_test_file_path: The path of the testing file for vision, which should be a npy file.
  • --vision_results_dir: The path of the vision results directory to save rendered images.
  • --audio_vertices_file_path: The path of the testing vertices file for audio, which should be a npy file.
  • --audio_forces_file_path: The path of forces file for audio, which should be a npy file.
  • --audio_results_dir: The path of audio results directory to save rendered impact sounds as .wav files.
  • --touch_vertices_file_path: The path of the testing vertices file for touch, which should be a npy file.
  • --touch_gelinfo_file_path: The path of the gelinfo file for touch that speficifies the gel rotation angle and deformation depth, which should be a npy file.
  • --touch_results_dir: The path of the touch results directory to save rendered tactile RGB images.

Data Format

  • --vision_test_file_path: It is a npy file with shape of (N, 6), where N is the number of testing viewpoints. Each data point contains the coordinates of the camera and the light in the form of (camera_x, camera_y, camera_z, light_x, light_y, light_z).
  • --audio_vertices_file_path: It is a npy file with shape of (N, 3), where N is the number of testing vertices. Each data point represents a coordinate on the object in the form of (x, y, z).
  • --audio_forces_file_path: It is a npy file with shape of (N, 3), where N is the number of testing vertices. Each data point represents the force values for the corresponding impact in the form of (F_x, F_y, F_z).
  • --touch_vertices_file_path: It is a npy file with shape of (N, 3), where N is the number of testing vertices. Each data point contains a coordinate on the object in the form of (x, y, z).
  • --touch_gelinfo_file_path: It is a npy file with shape of (N, 3), where N is the number of testing vertices. Each data point contains the gel rotation angle and gel displacement in the form of (theta, phi, depth).

ObjectFile with KiloOSF for Real-time Visual Rendering

To use KiloOSF, please make a copy of cuda in the root directory of this repo and follow the steps in CUDA extension installation. Run the following command to render visual appearance of the object from a specified camera viewpoint and lighting direction:

  $ python OF_render.py --modality vision --KiloOSF --object_file_path path_of_KiloOSF_ObjectFile \
      --vision_test_file_path path_of_vision_test_file \
      --vision_results_dir path_of_vision_results_directory \

Demo

Below we show an example of rendering the visual, auditory, and tactile data from the ObjectFile implicit representation for one object:

  $ python OF_render.py --modality vision,audio,touch --object_file_path Objects/25/ObjectFile.pth \
      --vision_test_file_path demo/vision_demo.npy \
      --vision_results_dir demo/vision_results/ \
      --audio_vertices_file_path demo/audio_demo_vertices.npy \
      --audio_forces_file_path demo/audio_demo_forces.npy \
      --audio_results_dir demo/audio_results/ \
      --touch_vertices_file_path demo/touch_demo_vertices.npy \
      --touch_gelinfo_file_path demo/touch_demo_gelinfo.npy \
      --touch_results_dir demo/touch_results/

The rendered images, impact sounds, tactile images will be saved in demo/vision_results/, demo/audio_results/, and demo/touch_results/, respectively.

License

ObjectFolder is CC BY 4.0 licensed, as found in the LICENSE file. The 1,000 high quality 3D objects originally come from online repositories including: ABO dataset, 3D Model Haven, YCB dataset, and Google Scanned Objects. Please also refer to their original lisence file.