/IKEA-Manuals-at-Work

IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos

Primary LanguageJupyter Notebook

IKEA Manuals at Work

4D Grounding of Assembly Instructions on Internet Videos

NeurIPS 2024 Datasets and Benchmarks

Yunong Liu1, Cristobal Eyzaguirre1, Manling Li1, Shubh Khanna1, Juan Carlos Niebles1, Vineeth Ravi2, Saumitra Mishra2, Weiyu Liu1*, Jiajun Wu1*

1Stanford University    2J.P. Morgan AI Research
*Equal advising

[Project Website] [Paper] [Dataset Setup Guide] [Notebook]

Overview

The IKEA-Manuals-at-Work dataset provides detailed annotations for aligning 3D models, instructional manuals, and real-world assembly videos. This is the first dataset to provide 4D grounding of assembly instructions on Internet videos, offering high-quality, spatial-temporal alignments between assembly instructions, 3D models, and real-world internet videos.

Key Features

  • 🪑 36 furniture models from 6 categories
  • 🎥 98 assembly videos from the Internet
  • 🔄 Dense spatio-temporal alignments between instructions and videos
  • 📊 Rich annotations including part segmentation, 6D poses, and temporal alignments

Getting Started

Installation

# Create and activate conda environment
conda create -n IKEAVideo python=3.8
conda activate IKEAVideo

# Install dependencies
pip install -r requirements.txt

# Set PYTHONPATH
export PYTHONPATH="./src:$PYTHONPATH"

Dataset Structure

data/
├── data.json           # Main annotation file
├── parts/             # 3D model files
├── manual_img/        # Instruction manual images
├── pdfs/              # Original PDF manuals
└── videos/            # Assembly videos

Dataset Contents

The dataset includes:

  • 3D Models: Detailed 3D models of furniture parts
  • Instruction Manuals: Step-by-step assembly instructions
  • Assembly Videos: Real-world assembly videos from the Internet
  • Rich Annotations:
    • ⏱️ Temporal step alignments
    • 🔄 Temporal substep alignments
    • 🎯 2D-3D part correspondences
    • 🎨 Part segmentations
    • 📐 Part 6D poses
    • 📷 Estimated camera parameters

For detailed information about the dataset, please refer to our datasheet.

Dataset Setup

  1. Download Required Files:
  • Annotation file: data/data.json
  • Assembly videos: Stanford Digital Repository
  • Clone the repo to obtain other resources (e.g. 3D models, manual images)
  • Place downloads in their respective directories as shown in Dataset Structure
  1. Explore the Dataset: Check our tutorial notebook: notebooks/data_viz.ipynb

Applications

The dataset supports various research directions:

  • 🔍 Assembly plan generation
  • 🎯 Part-conditioned segmentation
  • 📐 Part-conditioned pose estimation
  • 🎥 Video object segmentation
  • 🛠️ Shape assembly with instruction videos

License

This dataset is released under the CC-BY-4.0 license.

Citation

If you find this dataset useful for your research, please cite:

  @inproceedings{
  liu2024ikea,
  title={{IKEA} Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos},
  author={Yunong Liu and Cristobal Eyzaguirre and Manling Li and Shubh Khanna and Juan Carlos Niebles and Vineeth Ravi and Saumitra Mishra and Weiyu Liu and Jiajun Wu},
  booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2024}
  }

Contact

For questions and feedback: