This repository contains dataset generation code for ClevrTex benchmark from paper: ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation. For experiment code, see here.
Due to some changes of the T&Cs of the vendor we have previously obtained the materials from, it might no longer be possible to obtain and use the original materials in AI-related applications. We have thus compiled a new library of materials using textures from Polyhaven, ambientCG, and Sharetextures.com, available under CC0 licenses.
The new material library is available here for the main ClevrTex dataset, and here for the OOD. Simply replace the materials in the data/materials
and data/outd_materials
folders (or point the scripts to a different folder), respectively, and follow the instructions below to generate the dataset.
The follwing preparation steps are required to generate the dataset.
- Setting up blender
- Setting up python
- Setting up textures and materials
We used blender 2.92.3 for rendering. Newer versions are untested but should work at least up to a minor bump. One might download it from Blender website and follow installation instructions process as normal then skip to the final step. Or simply execute this (will set up blender in /usr/local/blender):
mkdir /usr/local/blender && \
curl -SL "http://mirror.cs.umn.edu/blender.org/release/Blender2.92/blender-2.92.0-linux64.tar.xz" -o blender.tar.xz && \
tar -xvf blender.tar.xz -C /usr/local/blender --strip-components=1 && \
rm blender.tar.xz && ln -s /usr/local/blender/blender /usr/local/bin/blender
Since we use "system interpreter" (see intructions bellow to set up a compatible one) for Blender headless mode, remove python that comes pre-packaged.
rm -rf /usr/local/blender/2.92/python
One needs to set up python with required libraries and with correct version. Blender uses python 3.7 (older or newer version will not work). For simplicty, use conda:
conda env create -f env.yaml
When invoking Blender use (assumes the appropriate env was named p37
) :
PYTHONPATH=~/miniconda3/envs/p37/bin/python \
PYTHONHOME=~/miniconda3/envs/p37 \
blender --background --python-use-system-env --python generate.py -- <args>
To ensure the textures are found and look good, consider trying with a single texture first (to save time).
To scan for errors and see how the end result might look like, consider using --test_scan
option in the generation script.*
In addition, consider --blendfiles
option to save blender scene after rendering for manual inspection.
To generate the dataset run the following (will produce a LOCAL_debug_000001.png example):
cd clevrtex-gen
./local_test.bash
Otherwise, please see arguments available to customise the rendering. Dataset variants can be recreated using appropriate
<variant>.json
files.
See project page for download links for CLEVRTEX.
clevrtex_eval.py
file contains dataloading logic to for convenient access to CLEVRTEX data.
Consider
from clevrtex_eval import CLEVRTEX, collate_fn
clevrtex = CLEVRTEX(
'path-to-downloaded-data', # Untar'ed
dataset_variant='full', # 'full' for main CLEVRTEX, 'outd' for OOD, 'pbg','vbg','grassbg','camo' for variants.
split='train',
crop=True,
resize=(128, 128),
return_metadata=True # Useful only for evaluation, wastes time on I/O otherwise
)
# Use collate_fn to handle metadata batching
dataloader = torch.utils.data.DataLoader(clevrtex, batch_size=BATCH, shuffle=True, collate_fn=collate_fn)
See CLEVRTEX_Evaluator
in clevrtex_eval.py
. It implements all the utilities needed.
This dataset builds upon
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson,
Bharath Hariharan,
Laurens van der Maaten,
Fei-Fei Li,
Larry Zitnick,
Ross Girshick
presented at CVPR 2017, code available at https://github.com/facebookresearch/clevr-dataset-gen
In particular we use a method for computing cardinal directions from CLEVR. See the original licence included in the clevr_qa.py file.
If you use ClevrTex dataset or generation code consider citing:
@inproceedings{karazija2021clevrtex,
author = {Laurynas Karazija and Iro Laina and
Christian Rupprecht},
booktitle = {Thirty-fifth Conference on Neural Information
Processing Systems Datasets and Benchmarks Track},
title = {{C}levr{T}ex: {A} {T}exture-{R}ich {B}enchmark for {U}nsupervised
{M}ulti-{O}bject {S}egmentation},
year = {2021},
}