vsitzmann/scene-representation-networks

Issues with extrinsics

Closed this issue · 8 comments

feem1 commented

Hello, love your work on this repo.

I have an issue where i use a modified version of stanford render script for my car obj but when i predict on cars with your pre trained model, i dont see any prediction in the gt_compare.

Is this occuring because the coordinate system of blender is not opencv? How do we approach this issue?

feem1 commented

I also tried colmap_wrapper you have provided in deepvoxels. This estimates the pose but after i feed pose with the corresponing images to pretrained cars, this renders the car out of focus (ie the center of the car is not the center of rotation of the car)

Similarly, I have also generated intrinsics and extrinsics using the above script. My understanding is that the contents of the intrinsics directory (9 vector) is the K matrix flattened, and the contents of the pose directory (16 vector) is the RT matrix flattened, followed by (0,0,0,1). However, so far the predicted normals look wrongly rotated. The cars should be on a turntable, with 0 elevation.
screenshot

Hey @ebartrum, from my study of the code. It actually only uses the intrinsics.txt file and not the intrinsics folder. The extrinsics seem to be RT matrix flattened but the intrinsics are written like below:
[alpha_u, u_0, v_0, 0.],
[0., 0., 0.],
[1.],
[resolution_x_in_px, resolution_y_in_px].

My issue is that my object is not at the focus of the camera while prediction.
image

Thanks @feemthan. I think I now see another problem that we are having. From the README, 'Camera poses are assumed to be in a "camera2world" format, i.e., they denote the matrix transform that transforms camera coordinates to world coordinates.' However, the matrices that we are using are transforming world coordinates to camera coordinates: 'An image pixel (u,v) is generated from world (x,y,z) coordinates through a 3x4 matrix using projective coordinates: kx=PX'

@feemthan the results seem to be working now for me! Here's what I did: take the RT matrix from the script you linked. append [0,0,0,1] so it is 4x4. This is world2cam. Now invert this matrix, to make cam2world. Flatten this to a 16 vector, and write in the pose directory, for each image. For intrinsics.txt, I followed your instructions.

Hey @ebartrum your invert method worked out for me, kinda. Thanks a lot!

I got the same issue of rotation shift. Seems to be off by 45 degrees. I am going to try rotating the poses by some degrees and try to reach a solution.
image

feem1 commented

@ebartrum Hey so I realized that the srns model just expects the flattened out version of cam.matrix_world with 0,0,0,1 appended to it. So this 3x4 matrix calculation is not required.
Thanks a lot for you help @ebartrum . Going to close this issue.