IKEA Manuals at Work

4D Grounding of Assembly Instructions on Internet Videos

NeurIPS 2024 Datasets and Benchmarks

Yunong Liu¹, Cristobal Eyzaguirre¹, Manling Li¹, Shubh Khanna¹, Juan Carlos Niebles¹, Vineeth Ravi², Saumitra Mishra², Weiyu Liu^1*, Jiajun Wu^1*

¹Stanford University ²J.P. Morgan AI Research
^*Equal advising

[Project Website] [Paper] [Dataset Setup Guide] [Notebook]

Overview

The IKEA-Manuals-at-Work dataset provides detailed annotations for aligning 3D models, instructional manuals, and real-world assembly videos. This is the first dataset to provide 4D grounding of assembly instructions on Internet videos, offering high-quality, spatial-temporal alignments between assembly instructions, 3D models, and real-world internet videos.

Key Features

🪑 36 furniture models from 6 categories
🎥 98 assembly videos from the Internet
🔄 Dense spatio-temporal alignments between instructions and videos
📊 Rich annotations including part segmentation, 6D poses, and temporal alignments

Getting Started

Installation

# Create and activate conda environment
conda create -n IKEAVideo python=3.8
conda activate IKEAVideo

# Install dependencies
pip install -r requirements.txt

# Set PYTHONPATH
export PYTHONPATH="./src:$PYTHONPATH"

Dataset Structure

data/
├── data.json           # Main annotation file
├── parts/             # 3D model files
├── manual_img/        # Instruction manual images
├── pdfs/              # Original PDF manuals
└── videos/            # Assembly videos

Dataset Contents

The dataset includes:

3D Models: Detailed 3D models of furniture parts
Instruction Manuals: Step-by-step assembly instructions
Assembly Videos: Real-world assembly videos from the Internet
Rich Annotations:
- ⏱️ Temporal step alignments
- 🔄 Temporal substep alignments
- 🎯 2D-3D part correspondences
- 🎨 Part segmentations
- 📐 Part 6D poses
- 📷 Estimated camera parameters

For detailed information about the dataset, please refer to our datasheet.

Dataset Setup

Download Required Files:

Annotation file: data/data.json
Assembly videos: Stanford Digital Repository
Clone the repo to obtain other resources (e.g. 3D models, manual images)
Place downloads in their respective directories as shown in Dataset Structure

Explore the Dataset: Check our tutorial notebook: notebooks/data_viz.ipynb

Applications

The dataset supports various research directions:

🔍 Assembly plan generation
🎯 Part-conditioned segmentation
📐 Part-conditioned pose estimation
🎥 Video object segmentation
🛠️ Shape assembly with instruction videos

License

This dataset is released under the CC-BY-4.0 license.

Citation

If you find this dataset useful for your research, please cite:

  @inproceedings{
  liu2024ikea,
  title={{IKEA} Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos},
  author={Yunong Liu and Cristobal Eyzaguirre and Manling Li and Shubh Khanna and Juan Carlos Niebles and Vineeth Ravi and Saumitra Mishra and Weiyu Liu and Jiajun Wu},
  booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2024}
  }

Contact

For questions and feedback:

📮 Open an issue on this GitHub repository
📧 Email Yunong Liu

yunongLiu1/IKEA-Manuals-at-Work