Camera parameters
Ainaz99 opened this issue ยท 29 comments
Hi,
I'm trying to render surface normals from the meshes in Blender, using the camera parameters provided. I scale the mesh using meters_per_asset_unit
. I use camera_keyframe_positions.hdf5
for the camera location and get the camera euler angles from camera_keyframe_orientations.hdf5'
with fov = pi/3 = 1.0471975511965976
.
But my renderings do not match the color images from the dataset for only some specific scenes. Is there any other camera parameter I'm missing?
Here's an example for ai_037_002
, cam_00
, frame.0000
:
Thanks for your help.
UPDATE (Jan 9 2022): This issue has been resolved. See contrib/mikeroberts3000
for details.
Hi! Great question.
It sounds like you're doing everything right. I have noticed this problem occasionally also. It appears to affect a small handful of our scenes, but I don't have a complete list. The problem occurs because there is occasionally an additional offset in the V-Ray scene file that gets applied to the camera parameters during photorealistic rendering. I believe the relevant parameters are the "Horizontal Shift" and "Vertical Shift" parameters described in the V-Ray documentation. If I had known about these unpleasant parameters prior to rendering our images, I certainly would have explicitly hard-coded them to 0 for each scene.
Perhaps you can help with this issue. (You are especially well-positioned to help because your independent rendering infrastructure with Blender is already set up.) I'm assuming that you have access to the source assets, and you have run the pre-processing phase of our pipeline. Otherwise, how could you do your own rendering in Blender? Anyway, there are several ways to proceed.
First, can you manually confirm that the scene you're experimenting with has non-zero values for these shift parameters in its exported vrscene file?
Second, given some non-zero shift parameters from a vrscene file, as well as our camera positions and camera orientations, can you compute the correct camera parameters such that your Blender rendering matches our ground truth rendering? I don't know exactly how to do this, because I don't know exactly what the shift parameters mean. Are they describing some kind of off-center perspective projection? What changes are required in your Blender setup to match our pre-rendered ground truth images pixel-perfectly?
Third, do you have a complete list of scenes that are affected? It should be straightforward to parse each vrscene file and search for any non-zero shift parameters. See this code for an example of how to programmatically access the cameras in a vrscene file. In addition to these shift parameters, there are also tilt parameters. Are any scenes affected by non-zero tilt parameters?
Haha sorry for hijacking your thread with all of these to-do items ๐ I wish I had more answers and fewer questions. But I figured this is the right place to document my incomplete understanding of the problem, and to highlight possible next steps.
Hi @mikeroberts3000,
I am also interested in multi-view applications based on Hypersim, and I stumbled upon the same issue as Ainaz99, Frank-Mic and rikba. It seems that a considerable amount of scenes are concerned with the shifted camera parameters and are thus unusable for multi-view applications.
From the reprojections tests that I have run, I think that non-zero tilt offsets are also involved, in addition to the shift offsets. Unfortunately I do not have access to the source assets and cannot run the steps that you suggested to investigate this issue. Did you have by chance time to investigate it and got fresh news about it?
If not, would it be possible to publicly expose the VRay tilt and shift parameters for each scene and cameras? I think that based on this information, I would be able to retrieve the correct camera position and orientation.
Overall I would be happy to contribute to the resolution of this problem, but I feel that some information coming from the source assets are necessary to achieve it, which I unfortunately do not have...
Hi @rpautrat, I agree with your assessment that it is necessary to expose some additional information from the source assets. Once this information is exposed, it should be possible to derive the correct camera intrinsics for the scenes that are affected by this issue. I'll follow up with you offline and maybe we can work on this together.
Hi @mikeroberts3000 @rpautrat ! Are there any updates on the correct camera parameters?
Hi @Ainaz99, I'm happy to say that we're making solid progress. @rpautrat has been doing a bunch of great experiments to figure all of this out.
We identified four V-Ray parameters that can affect the camera intrinsics, and we have an accurate mathematical model of what each of these parameters does in isolation. Roughly speaking, each parameter translates or rotates the image plane in camera space. But we don't have a model of how the parameters interact with each other, and there are many possible conventions (translation then rotation, rotation then translation, different Euler angle conventions, etc). We reached out to Chaos Group, and we're waiting for them to tell us the exact order of operations.
Suppose we knew exactly how to transform the image plane as a function of our V-Ray parameters. Let's call this transformation T. We have sketched out a derivation for the non-standard projection matrix P, computed as a function of T, that correctly projects points from world space to image space. This projection matrix P can then be used as a drop-in replacement for the usual pinhole camera perspective projection matrix in graphics and vision applications.
To summarize, we think we have a solid understanding of this issue, but we're waiting for Chaos Group to tell us exactly how to compute T based on the V-Ray parameters. If you're super motivated to get this issue resolved, and you don't want to wait for Chaos Group to get back to us, I'd be happy to send you the Jupyter notebooks that we've been using in our experiments. You don't need to install V-Ray to run our notebooks, and you can try to compute T from the V-Ray parameters with a brute force guess-and-check strategy.
Thanks @mikeroberts3000, happy to hear you made progress in solving the issue. Do you have any estimate on how long it will take the Chaos Group to answer you?
@Ainaz99 no estimate from Chaos Group. I'll definitely post here when we hear back from them. In the meantime, the invitation is still open for you to experiment with our notebooks and attempt to compute the correct transformation ๐ค
An alternative to the guess-and-check strategy would be to set up a simple scene, and infer the correct transformation from rendered observations. It would then be possible to collect a training set of (camera parameter, transformation) pairs, and fit a function to the training set.
@Ainaz99 no estimate from Chaos Group. I'll definitely post here when we hear back from them. In the meantime, the invitation is still open for you to experiment with our notebooks and attempt to compute the correct transformation nerd_face
An alternative to the guess-and-check strategy would be to set up a simple scene, and infer the correct transformation from rendered observations. It would then be possible to collect a training set of (camera parameter, transformation) pairs, and fit a function to the training set.
How am I able to acquire the notebook for the camera operations? Can you send a copy of that notebook you mentioned before?
(My email is liuzhy71@gmail.com. )
Hi @liuzhy71, you're a total legend for having a look at this. I'll send you an email with all of our debugging notebooks.
In case anyone else is interested, here is a diagram explaining what we think is going on.
We think the image plane is being transformed somehow, and there seem to be four camera parameters that control the transformation. So, our goal is to determine the function f that maps from the scalar camera parameters (p1,p2,p3,p4) to a transformation matrix T.
In our debugging notebook, we try to guess this function f based on a code snippet that we got from Chaos Group. This code snippet is correct for some combinations of parameters, but not others. So we must be getting something wrong. In order to test the correctness of f in our notebook, we compute a depth image using my own reference raycaster. In this test, we can control the outbound ray at each pixel. Using my reference raycaster, we want to obtain depth images that perfectly match the ones we obtain from V-Ray for all combinations of camera parameters. If we can do this, then we will know that we are implementing f correctly.
To make progress, I think a promising approach would be the following. First, compute the correct transformation T, given a depth image rendered by V-Ray with a known set of camera parameters (p1,p2,p3,p4). The transformation can be recovered from the depth image by solving a convex optimization problem. Anyway, once this is working, it is straightforward to render a large collection of depth images with randomly chosen camera parameters, solving for T for each rendered image. After doing so, we will have a large collection of (camera parameters, transformation) pairs. Finally, it is straightforward to fit some kind of simple function (e.g., a neural network) to all of the example pairs. This learned function can then be queried later to output the correct transformation T for a new set of parameters (p1,p2,p3,p4). Once the transformation T is known, it is straightforward to derive a modified projection matrix P (in terms of T) that projects world-space points to the correct image-space coordinates. This projection matrix can used as a drop-in replacement for the usual projection matrix in graphics and vision applications (e.g., multi-view stereo applications, rendering additional images that exactly match our pre-rendered Hypersim images, etc).
Of course doing all of this is an unpleasant hack. But so far, Chaos Group has not mathematically characterized these camera parameters, so we must resort to reverse-engineering their meaning. If anyone else is interested in having a look at this issue, comment here and I will send you our debugging notebooks.
Great news. I obtained some very useful code from Chaos Group, and this has enabled me to make some exciting progress. I now have a working implementation that computes the transformation T from the parameters (p1,p2,p3,p4). Using my own reference raycaster, I can now generate images that match V-Ray images exactly, even in the presence of non-standard camera parameters.
I have not yet derived the modified projection matrix P that projects world-space points to the correct pixel locations, but I believe this is relatively straightforward. I'll post any relevant updates here. Thanks again to everyone that is helping out with this issue. Post a comment here if you want to have a look at my latest notebook.
Wow, I was just able to compute the correct depth using the previous notebook. But I am still working on adjusting the projection matrix with OpenGL rendering
camera_shift_and_tilt.csv
. This is the camera parameters for (p1-p4). Is there any updates of the notebook? @mikeroberts3000
@liuzhy71 that spreadsheet looks great! I'm thinking about how we can adjust it slightly to make it a bit cleaner, and more suitable for inclusion in the actual dataset. Can you update your spreadsheet with the following information?
- Sort rows by scene name.
- Include a column called "includes_camera_physical_plugin" which is True or False depending on whether or not the vrscene has the CameraPhysical plugin.
- Include columns for all parameters in the CameraPhysical plugin. The plgparams executable that ships with V-Ray lists 46 parameters, but only 45 are accessible through the V-Ray AppSDK, so it is fine to only include 45 columns. If includes_camera_physical_plugin is False for a particular row, you can leave all of these columns blank.
- Use the V-Ray AppSDK when extracting parameter values (rather than your own parser) because the V-Ray AppSDK will populate the CameraPhysical plugin with the correct default values.
- I'm not sure exactly which of these parameters will prove to be useful in computer vision applications, and I don't want to try to guess, so I think it is better to export them all. My implementation of the transformation T depends on a few more parameters than the 4 described so far in this thread, so we will need to include more parameters in the spreadsheet no matter what.
The function for computing depth in the previous version of our notebook (i.e., the one I sent you) works for some combinations of parameters, but not others. An easy way to break it is to set horizontal_shift=1.0. I'll send you my latest notebook over email, which works correctly for all parameter combinations.
ok, I'll try to update spreadsheet with as many parameters as possible. I'm not so familiar with VRay SDK, all the data were parsed from .vrscene file. And I do not have all the files for scene ai_055_xxx. So some data may be missing.
@liuzhy71 I think there is a 30-day free trial available for the V-Ray AppSDK. If you prepare the code, I'll run it on all of the scenes.
I have some more good news. I derived a modified projection matrix P (computed in terms of V-Ray's camera parameters) that correctly accounts for this issue. I have verified that my projection matrix P correctly projects world-space points to the correct screen-space locations, even when V-Ray's non-standard camera parameters have a drastic effect on the rendered image. My projection matrix can be used as a drop-in replacement for the usual OpenGL perspective projection matrix. So I think the main technical challenge here has been resolved.
For example, here is a depth image for a scene that has been rendered with non-standard camera parameters.
horizontal_offset=0.2; vertical_offset=0.3; horizontal_shift=0.0; lens_shift=1.0;
The left image is generated by V-Ray, the middle image is generated by my own reference raycaster, and the right image is a difference image. We see here that the images are nearly identical. So we are generating the correct ray at each pixel.
Here is the same depth image, but I have projected the sink mesh (i.e., the world-space vertex positions belonging to the sink) onto the image.
The red dots are mesh vertices. We see that the projected vertices align very accurately with the sink in the image. So my modified projection matrix appears to be correct.
I have also tried other combinations of camera parameters, and my solution works correctly for those parameters too. Here are the resulting images with different camera parameters.
horizontal_offset=0.0; vertical_offset=0.0; horizontal_shift=0.5; lens_shift=0.0;
So we're nearly finished. The only task that remains is to expose the relevant camera parameters for each scene. I will try to do this over the next couple of weeks, and I will post my progress here.
I have manfully tested all the camera tilt and shift parameters. Now scene 009_003, 038_009, 039_004 are incorrect.
@mikeroberts3000 @liuzhy71 @Ainaz99 @rpautrat thank you all for the hard work on getting accurate camera parameters for these scenes.
Any update on when these will be released and we can use them? ๐
I'd also be curious if the fixes have been released somewhere :)
Hi @Ainaz99 @alexsax @jatentaki @liuzhy71 @rpautrat, I have some good news. I just checked in some data and example code that resolves this issue.
In the contrib/mikeroberts3000
directory, I provide a modified perspective projection matrix for each scene that can be used as a drop-in replacement for the usual OpenGL perspective projection matrix, as well as example code for projecting world-space points into Hypersim images.
I apologize that it took so long to address this issue. It was especially challenging to debug because V-Ray's behavior wasn't well-documented, and I've been busy with other projects and holiday travel.
Hi @mikeroberts3000 ,
Thanks for releasing the modified version of the camera parameters.
I'm wondering if there is a way to directly calculate new camera positions and orientations which were originally saved as camera_keyframe_positions.hdf5
and camera_keyframe_orientations.hdf5
only using the released perspective projection matrices for the scenes? And if yes, how is the calculation done? Thank you!
Hi @Ainaz99, this is possible, but there are some important technical details to be aware of. In particular, the modified "camera orientation" will have some extra transformations encoded in it, and it will not be a rotation matrix. As a result, any downstream code that intends to invert this matrix must take care to actually invert it, rather than merely transposing it.
To derive our modified camera orientation, consider the following equation that transforms a point in world-space p_world
(expressed in homogeneous coordinates) into a point in homogeneous clip-space p_clip
,
p_clip = M_proj_modified * M_cam_from_world * p_world
where M_proj_modified
is the modified projection matrix from our CSV file; and M_cam_from_world
is a 4x4 matrix that encodes the camera position and orientation. We can express this transformation in terms of the standard OpenGL projection matrix M_proj_canonical
as follows,
p_clip = M_proj_canonical * block_diag(M_cam_from_uv_canonical * M_cam_from_uv.I, 1) * M_cam_from_world * p_world
where M_cam_from_uv
is a 3x3 matrix defined in our CSV file for each scene; and M_cam_from_uv_canonical = diag([tan(fov_x/2.0), tan(fov_y/2.0), -1.0])
. By collecting terms in the matrix block_diag(...) * M_cam_from_world
, we see that we can define a modified camera orientation R_world_from_cam_modified
that completely accounts for our non-standard camera parameters.
R_world_from_cam_modified = camera_orientation * (M_cam_from_uv_canonical * M_cam_from_uv.I).I
where camera_orientation
is the camera orientation stored in our HDF5 files.
R_world_from_cam_modified
can be used as a drop-in replacement for the camera orientation stored in our HDF5 files, and completely accounts for all non-standard camera parameters when used in conjunction with a standard OpenGL projection matrix. We do not need to make any modifications to the camera positions stored in our HDF5 files. But remember that R_world_from_cam_modified
is not a rotation matrix.
Thank you @mikeroberts3000 for your explanation!
So if I want to calculate the worldspace transformation matrix
for the camera as follows,
R_world_from_cam = camera_orientation
t_world_from_cam = camera_position.T
M_world_from_cam = matrix(block([[R_world_from_cam, t_world_from_cam], [matrix(zeros(3)), 1.0]]))
can I use the new R_world_from_cam_modified
?
Thank you.
@Ainaz99 that looks correct to me, assuming that you want M_world_from_cam
to transform points to world-space from camera-space, and that you are expressing points in homogeneous coordinates as 4D column vectors. Remember to proceed with caution because your modified M_world_from_cam
has extra transformations encoded into it, so it no longer encodes pure rotation and translation.
Thank you @mikeroberts3000. So I want to use the new camera transformation matrix in Blender.
I'm attaching some images which show from left to right: the RGB
, the old rendering
(using old M_world_from_cam
), the new rendering
(using the R_world_from_cam_modified
), and the RGB and new rendering overlaid. Although the new rendering looks better, there is still some difference which I assume it's due to the extra transformations that you mentioned.
Do you know how I can take care of these extra transformations? Are there any other parameters I have to manually change for the camera?
I haven't spent much time with Blender, so I'm not sure what exactly it is doing with the matrices you're specifying. Is Blender rendering images via a rasterization approach or a raycasting approach? How exactly are you specifying these matrices to Blender? Are you specifying position and orientation in a combined 4x4 M_world_from_cam_modified
matrix? Or are you specifying R_world_from_cam_modified
and t_world_from_cam
separately?
In these notebooks, I show how to reproduce pre-rendered Hypersim images pixel-perfectly using a rasterization approach and a raycasting approach for this specific scene. It should be straightforward to figure out what Blender is doing that is different to these notebooks by digging into the Blender documentation or source code. For what it's worth, in my local repository, I computed R_world_from_cam_modified
according to the equation I posted above, and I verified that it behaves as expected, both for rasterization and raycasting.
Hi @Ainaz99, I'm just double-checking if you ever got this rendering functionality figured out in Blender.
I am working with the provided camera pose data from the Hypersim dataset, specifically the files:
``
camera_positions_hdf5_file = os.path.join(camera_dir, "camera_keyframe_positions.hdf5")
camera_orientations_hdf5_file = os.path.join(camera_dir, "camera_keyframe_orientations.hdf5")
with h5py.File(camera_positions_hdf5_file, "r") as f:
camera_positions = f["dataset"][:]
with h5py.File(camera_orientations_hdf5_file, "r") as f:
camera_orientations = f["dataset"][:]
camera_position_world = camera_positions[frame_id]
R_world_from_cam = camera_orientations[frame_id]
``
From the above code, I understand that the dataset provides the camera's position (camera_position_world) and orientation (R_world_from_cam) in the 3ds Max coordinate system.
I need to convert this data into Blender's transform_matrix (4x4), considering the differences in the coordinate systems between 3ds Max and Blender.
Could you clarify the exact steps required to transform the provided position and rotation data into Blender's coordinate system? If possible, an example of the transformation process or the correct transformation matrix would be very helpful.
Thank you for your assistance!
@huntorochi There is no need to duplicate this question here and in your other post.