apple/ml-hypersim

Render bounding box images

OrangeSodahub opened this issue · 11 comments

Hi, the command: python ../../code/python/tools/scene_generate_images_bounding_box.py --scene_dir scenes/ai_001_001 --camera_name cam_00 --bounding_box_type object_aligned_2d --frame_id 0 --num_pixels_per_fragment 1 generates the images with lines of bounding boxes, e.g.:
image
, and I wonder is there any tools/scripts to render the color-filled bounding boxes images? It means the output images only show the semantic color from bounding box without any original colors.

Another question is: I browsed the file https://github.com/apple/ml-hypersim/blob/6a727648a6c3a1401e769f7c3a5457880cfc9eec/code/python/tools/scene_generate_bounding_boxes.py however not found where to define the semantic label for each bounding box, could you figure it out? Thanks!

According to:

for sii in unique(mesh_objects_sii):
if sii == -1:
continue
color_sii = semantic_instance_colors[sii]

The color for each bounding box seems to be determined randomly (the orders of the boxes array)

Hi, great questions.

We don't provide code to generate solid bounding box renderings, but it would be straightforward to modify our code to do that. Our code implements a very basic line rasterizer, but you could easily hack it to implement a basic triangle rasterizer instead. We didn't bother to do this because we just wanted a quick-and-dirty script to generate one of the figures in our paper. Note that rasterizing triangles will generate a lot more fragments than lines, and it should not come as a surprise that rasterizing lots of fragments in a pure Python script will be slow.

A faster alternative would be to render the solid bounding boxes in a standalone OpenGL program, and then composite your OpenGL rendering with a Hypersim image, using your rendered depth image and a Hypersim depth image to control the compositing.

The color of each bounding box is determined by the metadata_semantic_instance_colors.hdf5 file for each scene. Look at the code in scene_generate_images_bounding_box.py more carefully to see how this works. If you wanted to render using the usual NYU40 semantic label colors, rather than our instance colors, you could use the metadata_semantic_colors.hdf5 instead. See #49 for a discussion of how to do this.

Thanks, let me clarify something. I want to get the label id (e.g. NYU40) for each bounding box, after I checked scene_generate_images_bounding_box.py, focusing on:

for sii in unique(mesh_objects_sii):
if sii == -1:
continue
color_sii = semantic_instance_colors[sii]
bounding_box_center_world = matrix(bounding_box_positions[sii]).A
bounding_box_extent_world = matrix(bounding_box_extents[sii]).A

I'm confused about the relationship between mesh_objects_si.hdf5 and metadata_semantic_instance_bounding_box_object_aligned_2d_*.hdf5 for each scene, for example, the scene ai_001_001, the mesh_objects_si file has 1391 values while the bounding box file has 56 values. According to existing example, it seems that I should get the content of mesh_objects_si.hdf5 first and filter out the -1 value, and use the values as the index to fetch the bounding box, and the same way to get the colors. So the values in mesh_objects_si.hdf5 are the labeled class ids?

--- update --
I found that each scene also has a file metadata_objects.csv, and it has the same number of values as the mesh_objects_si.hdf5

Please read the 3D Bounding Boxes and Mesh Annotations sections of the README and post here if you're still unclear.

Thanks, now I know all about mesh-object annotations and 3d bounding box annotations. However still a little confused about how to get the NYU id for each bounding box. The N of mesh-object annotation file and the N of 3d bounding box annotation file are not the same thing right? I checked, they are not equal. And according to the code snipet above, especially:

bounding_box_center_world = matrix(bounding_box_positions[sii]).A

The sii should be the semantic label id, and why use it to get the corresponding bounding box?

When working with our mesh annotations, there are two distinct kinds of entities you need to keep track of: low-level objects and semantic instances.

Each raw scene contains a flat list of low-level objects. But the low-level objects are not semantically meaningful (e.g., a door handle, one leg of a chair, a single button on a shirt, etc). So these low-level objects must be manually grouped into meaningful semantic instances (e.g., an entire door, an entire chair, an entire shirt, etc). We perform this grouping step using our scene annotation tool.

The ai_001_001 scene contains 1391 low-level objects. These low-level objects have been grouped into 56 semantic instances. The mesh_objects_sii file contains the mapping from low-level object ID to semantic instance ID. Therefore, we expect this file to have 1391 entries. Likewise, the mesh_objects_si file contains the mapping from low-level object ID to NYU40 semantic ID. Again, we expect this file to have 1391 entries. We provide exactly one bounding box for each semantic instance, so there are 56 bounding boxes. Therefore, we expect each metadata_semantic_instance_bounding_box_* file to have 56 entries. Whenever you see si in our code, it refers to an NYU40 semantic ID. Whenever you see sii in our code, it refers to a semantic instance ID.

These details are described in our documentation, as well as in issues that have been addressed previously. I suggest slowing down and reading our documentation carefully first, and then examining our data afterwards. It appears as though you are proceeding in the opposite order (examining our data first, then posting GitHub issues, then skimming our documentation).

We discuss various alternative approaches for getting the semantic ID for each bounding box in #49.

I got it, THANKS!

I got it, THANKS!

Hello, do u success? Could you provide your code for rendering color-filled bounding box, or explain the details? Thanks.

Hi, great questions.

We don't provide code to generate solid bounding box renderings, but it would be straightforward to modify our code to do that. Our code implements a very basic line rasterizer, but you could easily hack it to implement a basic triangle rasterizer instead. We didn't bother to do this because we just wanted a quick-and-dirty script to generate one of the figures in our paper. Note that rasterizing triangles will generate a lot more fragments than lines, and it should not come as a surprise that rasterizing lots of fragments in a pure Python script will be slow.

A faster alternative would be to render the solid bounding boxes in a standalone OpenGL program, and then composite your OpenGL rendering with a Hypersim image, using your rendered depth image and a Hypersim depth image to control the compositing.

The color of each bounding box is determined by the file for each scene. Look at the code in more carefully to see how this works. If you wanted to render using the usual NYU40 semantic label colors, rather than our instance colors, you could use the instead. See #49 for a discussion of how to do this.metadata_semantic_instance_colors.hdf5``scene_generate_images_bounding_box.py``metadata_semantic_colors.hdf5

Hello, thanks for your reply. I have a simple question:
I know how to recolor the bounding box. But If I want to color walls and floors based on the NYU label, how should I proceed?

I know how to recolor the bounding box. But If I want to color walls and floors based on the NYU label, how should I proceed?

I'm not sure I understand your question.

  • Do you want to generate images where wall and floor pixels are colored according to their NYU40 labels? What's wrong with using the semantic images we provide?

  • Do you want to generate colored bounding boxes around walls and floors? When generating bounding boxes, we consider some semantic categories (e.g., table, chair, etc) but we skip others (e.g., wall, floor, etc) because we don't think the bounding boxes for these categories are meaningful. In other words, we don't have precomputed bounding boxes for walls, floors, etc. You could always compute your preferred bounding boxes yourself. If you want to compute bounding boxes based on the original triangle meshes for a particular scene, then you would need to pay for the source assets, but you could still use our existing triangle mesh annotations. Alternatively, you could reshape(...) the position and semantic images we provide for each scene into a labeled point cloud, and then use the point cloud to compute bounding boxes for walls, floors, etc.

If I'm still misunderstanding your question, then please describe it in a paragraph or two of more detail.

I know how to recolor the bounding box. But If I want to color walls and floors based on the NYU label, how should I proceed?

I'm not sure I understand your question.

  • Do you want to generate images where wall and floor pixels are colored according to their NYU40 labels? What's wrong with using the images we provide?semantic
  • Do you want to generate colored bounding boxes around walls and floors? When generating bounding boxes, we consider some semantic categories (e.g., table, chair, etc) but we skip others (e.g., wall, floor, etc) because we don't think the bounding boxes for these categories are meaningful. We don't have precomputed bounding boxes for walls, floors, etc. You could always compute your preferred bounding boxes yourself. If you want to compute bounding boxes based on the original triangle meshes for a particular scene, then you would need to pay for the source assets, but you could still use our existing triangle mesh annotations. Alternatively, you could the and images we provide for each scene into a labeled point cloud, and then use the point cloud to compute bounding boxes for walls, floors, etc.reshape(...)``position``semantic

If I'm still misunderstanding your question, then please describe it in a paragraph or two of more detail.

I understand. Sorry for my stupid question. Thanks!