zhou13/holicity

AccuCities CAD model units and coordinate frame

v-pnk opened this issue · 15 comments

v-pnk commented

Hello,
In README.md, there is: "The unit of the CAD models is the meter."

I downloaded the free AccuCities CAD model TQ3280 – Free 3D London Sample and the units of that model seem to be centimeters. At least when I measure the length of the model tile side in MeshLab, I get roughly 100 000. I've tried it for all the detail levels of .obj format.

holicity_model_measurement03

My intention is to render the model from the given camera poses. Currently, to get valid renderings, I have to multiply the translation vector of camera poses by 100 and apply rotation around the global X axis by 90 degs. (For the sake of completeness, I also have to rotate the local coordinate frame around X by 180 degs to get to the OpenGL camera convention, but that is not part of the issue, so I don't include that in the example below.)

T = camr_npz['R'] @ np.array([[1,0,0,0], [0,0,-1,0], [0,1,0,0], [0,0,0,1]])
R = T[0:3, 0:3]
t = T[0:3, 3]
t = 100 * t

Did I misunderstand the instructions?

Thank you in advance!

Edit: Fixed the code.

Hi @v-pnk, did you get reasonable renderings with your settings? If yes, I could add it to the README.md. It is possible that AccuCities changes the scales of their model since my work. For the local coordinate, .obj is infamous for inconsistent coordinate across different parsers as it only defines forward/up rather than xyz.

v-pnk commented

Hi @zhou13, thanks for the explanation. Yes, I've got reasonable renderings, which seem to align to the corresponding images (up to the precision of the model).

image

hi, how do you render the cad model, can you share the rendering code?thanks

v-pnk commented

Hi @jeannotes,

first parse image metadata (_camr.npz file) to get a valid camera pose and matching instrinsics.

camr_data_dict = np.load(camr_path)

T = camr_data_dict["R"] @ np.array([[1,0,0,0], [0,0,-1,0], [0,1,0,0], [0,0,0,1]])
T = np.array([[1,0,0,0], [0,-1,0,0], [0,0,-1,0], [0,0,0,1]]) @ T
T[0:3, 3] = 100 * T[0:3, 3]

FoV = np.radians(camr_data_dict["fov"])
img_size = 512 # all the images in the dataset are 512 x 512
f = (img_size / 2.0) * np.tan(FoV / 2.0)
c = img_size / 2.0
K = np.array([[f, 0.0, c], [0.0, f, c], [0.0, 0.0, 1.0]])

After getting the camera pose and intrinsics, you can use Open3D to load the mesh and render the image.

mesh = o3d.io.read_triangle_model(args.input_cad_model, True)

renderer = o3d.visualization.rendering.OffscreenRenderer(512, 512)
renderer.scene.add_model("CAD mesh", mesh)
renderer.setup_camera(K, T, 512, 512)
image = renderer.render_to_image()

Note 1: The images in my previous answers were generated by applying ambient occlusion coloring on the mesh, so the structure is more visible. That can be done in MeshLab (Filters --> Color Creation and Processing --> Ambient Occlusion).

Note 2: There are two sets of images in the dataset. Only images with "_HD_" in the name are from inside of the CAD model. The images with "_LD_" were captured outside of the area of the CAD model and so rendering from their pose does not make sense.

Note 3: This works for the Accucity models without texture. The textured models have a different coordinate frame.

Whole script: holicity_render_image.zip

Edit: made the code more clear

@v-pnk really thanks!
q1:
your code seems this need to run in local machine, any other methods?
q2:
what do you mean "inside of the CAD model" and "outside of the CAD model"?
I know that "LD" images doesn't have moving objects

v-pnk commented

Hi @jeannotes,
q1: The script should run everywhere, where a sufficiently recent versions of Open3D, NumPy and argparse (and compatible Python3) are installed.

q2: The free CAD model sample covers only 1 km x 1 km area of London (see this image from AccuCities website). The images which are in this covered area are marked "_HD_". The rest, which is outside of this area is marked "_LD_".

@v-pnk thanks for your reply.
even if I use 'HD' images, I find that these images cannot correspond to render cad images precisely, because it has moving objects in original 'HD' images.
seems that 'LD' images are ok, but we cannot get larger city models freely.
am I right?

v-pnk commented

Yes, you are right.

The CAD models cannot contain the moving objects present in the original images and are very roughly approximating the geometry of the buildings and landscape. Smaller static objects (lamps, benches etc.) are also completely omitted.

As you say, the other parts of the AccuCities London model are paid.

@v-pnk thanks for that

@v-pnk can you share the render script for "The textured models"? thanks

v-pnk commented

Sure, you just need to transform the camera poses. The transformation written below was obtained by aligning the textured model on the untextured one with ICP. As the majority of structure is the same between the two models, the alignment should be sufficiently precise.

T = camr_data_dict["R"] @ np.array([
    [1.0, 0.0, 0.0, -531998.40], 
    [0.0, 0.0, -1.0, -181005.90], 
    [0.0, 1.0, 0.0, 0.0], 
    [0.0, 0.0, 0.0, 1.0]])
T = np.array([[1,0,0,0],[0,-1,0,0],[0,0,-1,0],[0,0,0,1]]) @ T

Full script is here: holicity_render_image_textured.zip

Note 1: I had to convert the mesh from the set of multiple FBX files to single OBJ. That can be done e.g. in Blender:

  • File --> Import --> FBX --> select all the FBX files --> Import
  • Select all (press A)
  • File --> Export --> Wavefront --> enter the path --> Export
    • the transformation in the rendering script should be consistent with the default settings (Scale: 1.00, Forward: -Z, Up: Y)
  • if the textures cannot be found, open the generated .mtl file and correct the paths

@v-pnk when I download the models, actually I choose obj format,
q1: I think obj format texture is enough,right?
q2: should I use FBX files? why?
q3: when I download obj files, there three folders: level 1, level2, level 3, any differences?

@v-pnk sorry about that, it seems that I cannot download the textured model, it appears the the textured model on the website is not found, and I cannot download, though the price is still free. can you share the textured model with me, thanks.
and my email is j.z.feng@foxmail.com
thanks!

v-pnk commented

Hi, sorry for late reply, I had to solve few things on fire.

When you buy the model, you get 5 links - one (which should give you the whole model) apparently does not work. But the 4 others should work - each corresponds to one of the 0.5 x 0.5 km tiles (NE, NW, SE, SW) in the FBX format - that's what I used.

If I remember correctly, the HoliCity dataset camera poses are only in the SE part.

Here is the SE part converted to OBJ: google drive link

The original model is licensed under CC-BY-SA 4.0, so I believe I am not violating anything by sharing the converted model with the original license and the link to the authors webpage. If there are any issues with the licensing, please let me know.

Hi @jeannotes,

first parse image metadata (_camr.npz file) to get a valid camera pose and matching instrinsics.

camr_data_dict = np.load(camr_path)

T = camr_data_dict["R"] @ np.array([[1,0,0,0], [0,0,-1,0], [0,1,0,0], [0,0,0,1]])
T = np.array([[1,0,0,0], [0,-1,0,0], [0,0,-1,0], [0,0,0,1]]) @ T
T[0:3, 3] = 100 * T[0:3, 3]

FoV = np.radians(camr_data_dict["fov"])
img_size = 512 # all the images in the dataset are 512 x 512
f = (img_size / 2.0) * np.tan(FoV / 2.0)
c = img_size / 2.0
K = np.array([[f, 0.0, c], [0.0, f, c], [0.0, 0.0, 1.0]])

After getting the camera pose and intrinsics, you can use Open3D to load the mesh and render the image.

mesh = o3d.io.read_triangle_model(args.input_cad_model, True)

renderer = o3d.visualization.rendering.OffscreenRenderer(512, 512)
renderer.scene.add_model("CAD mesh", mesh)
renderer.setup_camera(K, T, 512, 512)
image = renderer.render_to_image()

Note 1: The images in my previous answers were generated by applying ambient occlusion coloring on the mesh, so the structure is more visible. That can be done in MeshLab (Filters --> Color Creation and Processing --> Ambient Occlusion).

Note 2: There are two sets of images in the dataset. Only images with "HD" in the name are from inside of the CAD model. The images with "LD" were captured outside of the area of the CAD model and so rendering from their pose does not make sense.

Note 3: This works for the Accucity models without texture. The textured models have a different coordinate frame.

Whole script: holicity_render_image.zip

Edit: made the code more clear

Sorry to bother you and thanks to your great work, I have rendered the correct image, but I have some questions:

  1. The camera pose "R" is defined in OpenCV coordinate?
  2. How to visualize the camera with open3d? I want to reconstruct a part of the CAD model but fail to choose the region and the matched images.