MAIR: Multi-view Attention Inverse Rendering with 3D Spatially-Varying Lighting Estimation

Project Page | Paper | Code(Test Only)

This is the official webpage for MAIR's OpenRooms FF dataset.

For full access to the dataset, please fill out the form in this file and send it to jhcho@kist.re.kr. We will then give you access to the dataset via your email.

For any questions, please email: happily@kist.re.kr.

Dataset Introduction

OpenRooms FF(Forward Facing) is a dataset that extends OpenRooms into a multi-view setup. Each image set consists of 9 images looking in the same direction. For a detailed description of dataset creation, please refer to the paper's supplementary.

Dataset Overview

Since OpenRooms FF is based on images from OpenRooms, OpenRooms FF's scenes are also rendered in 6 versions: main_xml, main_xml1, mainDiffMat_xml, mainDiffMat_xml1, mainDiffLight_xml and mainDiffLight_xml1. The format of the file name is <img_ind>_<data_type>_<view_ind>.

<img_ind> indicates the index corresponding to the original openrooms image. For example, if the file name is main_xml1/scene0001_00/8_*, it means that it was reproduced from the image(camera pose) of main_xml1/scene0001_00/*_8 in the original openrooms. <data_type> indicates the type of data (ex. im, imnormal, immask, etc.) and <view_ind> indicates the multi-view index (1-9) as shown in the image below.

The training/testing split of the scenes can be found in this link.

Dataset split: This is the Training / Test dataset split used in the MAIR experiment. However, since this split is not divided per scene, we strongly recommend using this original OpenRooms Training / Test dataset split.
Image: The 480 × 640 HDR images <img_ind>_im_<view_ind>.rgbe, which can be read with the python command.
```
im = cv2.imread(imName, -1)[:, :, ::-1]
```
Material: The 480 × 640 diffuse albedo maps <img_ind>_imbaseColor_<view_ind>_.png and roughness map <img_ind>_imroughness_<view_ind>.png. Note that the diffuse albedo map is saved in sRGB space. To load it into linear RGB space, we can use the following python commands. The roughness map is saved in linear space and can be read directly.
```
im = cv2.imread(imName)[:, :, ::-1]
im = (im.astype(np.float32 ) / 255.0) ** (2.2)
```

Geometry: The 480 × 640 normal maps <img_ind>_imnomral_<view_ind>.png and depth maps <img_ind>_imdepth_<view_ind>.dat. The R, G, B channel of the normal map corresponds to right, up, backward direction of the image plane. To load the depth map, we can use the following python commands.

with open(imName, 'rb') as fIn:
    # Read the height and width of depth
    hBuffer = fIn.read(4)
    height = struct.unpack('i', hBuffer)[0]
    wBuffer = fIn.read(4)
    width = struct.unpack('i', wBuffer)[0]
    # Read depth
    dBuffer = fIn.read(4 * width * height )
    depth = np.array(
        struct.unpack('f' * height * width, dBuffer ),
        dtype=np.float32 )
    depth = depth.reshape(height, width)

Predicted Depth: We also provide predicted 480 × 640 depth maps <img_ind>_cdsdepthest_<view_ind>.dat and its confidence map <img_ind>_cdsconf_<view_ind>.dat obtained using CDS-MVSNet that we used in our experiments. The method of reading the file is the same as above.
Mask: The 480 × 640 grey scale mask <img_ind>_immask_<view_ind>.png for light sources. The pixel value 0 represents the region of environment maps. The pixel value 0.5 represents the region of lamps. Otherwise, the pixel value will be 1.
SVLighting: The (120 × 8) × (160 × 16) per-pixel environment maps <img_ind>_imenvlow_<view_ind>.hdr. The spatial resolution is 120 x 160 while the environment map resolution is 8 x 16. To read the per-pixel environment maps, we can use the following python commands.
```
# Read the envmap of resolution 960 x 2560 x 3 in RGB format
env = cv2.imread(imName, -1)[:, :, ::-1]
# Reshape and permute the per-pixel environment maps
env = env.reshape(120, 8, 160, 16, 3)
env = env.transpose(0, 2, 1, 3, 4)
```
We recommend using Rclone to avoid slow or unstable downloads.
SVLightingDirect: The (30 × 16) × (40 × 32) per-pixel environment maps with direct illumination only <img_ind>_imenvDirect_<view_ind>.hdr. The spatial resolution is 30 × 40 while the environment maps resolution is 16 × 32. The direct per-pixel environment maps can be load the same way as the per-pixel environment maps.

Camera: The 3 x 6 x 9 Camera intrinsic, extrinsic parameters and scene boundary <img_ind>_cam_mats.npy.

# Read all cameras (3 x 6 x 9).
cam_mats = np.load(camName)
# Read camera 5 (<view_ind> is 5)
cam_5 = cam_mats[:, :, 4]

# Read camera 1 (<view_ind> is 1)
cam_1 = cam_mats[:, :, 0]

# read camera to world matrix (3 x 4)
# camera coordinates axes are right, down, forward. 
camera_to_world = cam_1[:, :4]

# image height, width, focal length
h, w, f = cam_1[:, 4]

# minimum depth, maximum depth.
min_z, max_z, _ = cam_1[:, 5]

Related Datasets

The OpenRooms FF dataset is built on several prior works, as noted below.

OpenRooms dataset: The original OpenRooms dataset.
ScanNet dataset: The real 3D scans of indoor scenes.
Scan2cad dataset: The alignment of CAD models to the scanned point clouds.
Laval outdoor lighting dataset: HDR outdoor environment maps
HDRI Haven lighting dataset: HDR outdoor environment maps
PartNet dataset: CAD models
Adobe Stock: High-quality microfacet SVBRDF texture maps. Please license the materials from Adobe Stock.

Citation

If you find our work is useful, please consider cite:

@article{choi2023mair,
  title={MAIR: Multi-view Attention Inverse Rendering with 3D Spatially-Varying Lighting Estimation},
  author={Choi, JunYong and Lee, SeokYeong and Park, Haesol and Jung, Seung-Won and Kim, Ig-Jae and Cho, Junghyun},
  journal={arXiv preprint arXiv:2303.12368},
  year={2023}
}