NeurIPS 2024 Datasets and Benchmarks
Yunong Liu1, Cristobal Eyzaguirre1, Manling Li1, Shubh Khanna1, Juan Carlos Niebles1, Vineeth Ravi2, Saumitra Mishra2, Weiyu Liu1*, Jiajun Wu1*
1Stanford University 2J.P. Morgan AI Research
*Equal advising
[Project Website] [Paper] [Dataset Setup Guide] [Notebook]
The IKEA-Manuals-at-Work dataset provides detailed annotations for aligning 3D models, instructional manuals, and real-world assembly videos. This is the first dataset to provide 4D grounding of assembly instructions on Internet videos, offering high-quality, spatial-temporal alignments between assembly instructions, 3D models, and real-world internet videos.
- 🪑 36 furniture models from 6 categories
- 🎥 98 assembly videos from the Internet
- 🔄 Dense spatio-temporal alignments between instructions and videos
- 📊 Rich annotations including part segmentation, 6D poses, and temporal alignments
# Create and activate conda environment
conda create -n IKEAVideo python=3.8
conda activate IKEAVideo
# Install dependencies
pip install -r requirements.txt
# Set PYTHONPATH
export PYTHONPATH="./src:$PYTHONPATH"
data/
├── data.json # Main annotation file
├── parts/ # 3D model files
├── manual_img/ # Instruction manual images
├── pdfs/ # Original PDF manuals
└── videos/ # Assembly videos
The dataset includes:
- 3D Models: Detailed 3D models of furniture parts
- Instruction Manuals: Step-by-step assembly instructions
- Assembly Videos: Real-world assembly videos from the Internet
- Rich Annotations:
- ⏱️ Temporal step alignments
- 🔄 Temporal substep alignments
- 🎯 2D-3D part correspondences
- 🎨 Part segmentations
- 📐 Part 6D poses
- 📷 Estimated camera parameters
For detailed information about the dataset, please refer to our datasheet.
- Download Required Files:
- Annotation file:
data/data.json
- Assembly videos: Stanford Digital Repository
- Clone the repo to obtain other resources (e.g. 3D models, manual images)
- Place downloads in their respective directories as shown in Dataset Structure
- Explore the Dataset:
Check our tutorial notebook:
notebooks/data_viz.ipynb
The dataset supports various research directions:
- 🔍 Assembly plan generation
- 🎯 Part-conditioned segmentation
- 📐 Part-conditioned pose estimation
- 🎥 Video object segmentation
- 🛠️ Shape assembly with instruction videos
This dataset is released under the CC-BY-4.0 license.
If you find this dataset useful for your research, please cite:
@inproceedings{
liu2024ikea,
title={{IKEA} Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos},
author={Yunong Liu and Cristobal Eyzaguirre and Manling Li and Shubh Khanna and Juan Carlos Niebles and Vineeth Ravi and Saumitra Mishra and Weiyu Liu and Jiajun Wu},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024}
}
For questions and feedback:
- 📮 Open an issue on this GitHub repository
- 📧 Email Yunong Liu