google/stereo-magnification

Understanding camera parameters used in code

PuneetKohli opened this issue · 7 comments

I'm interested in running the this model on a set of stereo images we have with us and synthesize multiple views outside the original camera baseline.

Our stereo images are taken from a camera with similar parameters to the iPhone X, i.e. baseline = 1.35cm and focal = 28mm (in 35mm terms).

(For reference, the iPhone X camera is 1.14cm baseline with focal of 28mm in 35mm terms)

Thankfully you included iPhone examples as a part of the project page. Below is the example we used as a reference.

mpi_from_images.py \
  --image1=paper_examples/iphone/bikes/bikes_left.jpg \
  --image2=paper_examples/iphone/bikes/bikes_right.jpg \
  --output_dir=paper_examples/iphone/bikes/results \
  --fx=1.7233965400390623 \
  --fy=2.2978620533854164 \
  --xoffset=0.0082 \
  --render \
  --render_multiples="-2,-1.5,-1,-0.5,0,0.5,1,1.5,2"

In our case, we passed in the same parameters to obtain reasonable results (given that the cameras are very similar). But I noticed for certain images, objects closer than approximately 1 meter do not converge in the output render.

In order to debug this, I tried playing around with the parameters but am still unclear on how the parameters in the example relate to the actual camera parameters.

I would greatly appreciate it if you could help provide with some sort of formulation to go from "camera focal" to "fx/fy" and "camera baseline" to "xoffset".

In summary,

  • Could you please provide an explanation (or mathematical formulation), to go from 'camera focal length' to 'fx' and 'fy' parameters
  • Could you please provide an explanation (or mathematical formulation), to go from 'camera baseline' to 'xoffset' parameter.

If it helps, I am attaching one of the stereo pairs I have tried to get the MPI representation of, and a generated GIF (using the same parameters as highlighted above).
Camera baseline = 1.35cm
fx/fy = 28 in 35mm terms
Image w/h = 640x360

Thank you for your help.

mpi_render_example (1)
mpi_left
mpi_right

@snaves Appreciate if you could weigh in - I had dropped an email with the same question(s) to @tinghuiz a while back but did not hear from him.

reyet commented

Hi Puneet – here are a couple of suggestions:

Near objects
It's not surprising that content nearer than 1m would fail to work well, because the default value for --min_depth is 1, meaning that the nearest MPI plane is 1m away. So I would first suggest keeping your other settings the same, and reducing min_depth. If you don't have any distant objects in your scene, you can also reduce --max_depth too.

Camera parameters
fx and fy should be the actual focal length as a fraction of the image width and height. For example, a 35mm camera has a (landscape) image size of 36 x 24 mm, so with a 28mm lens you would have fx = 28/36 and fy = 28/24. You mentioned that you have a 28mm-equivalent lens, but you may have a different aspect ratio from a 35mm camera so you'd need to adjust fx or fy to account for that.
xoffset should be just the camera baseline in metres.

In addition to @reyet's good comments, I also downloaded your images and took a look at them. If you flip back and forth, you will notice that the foreground objects (like the thumb) move to the left, but the background objects (like the ground) move very slightly to the right. This suggests that the stereo pair isn't perfectly rectified, but instead there is a slight rotation (indeed, you sort of get the sense of a small rotation as you flip between them). This suggests that using a pure translation will not work that well... unfortunately, you might need to more accurately calibration your cameras using a two-frame SfM method (after which you can either rectify the images in software, or else output a more complete pose as input to the stereo magnification code).

Thanks @snaves and @reyet for your replies.

@reyet I did try by reducing the min_depth parameter to 0.1 and it does seem to provide better results (less ghosting, gif attached). In the particular case moving max_depth (down to 10m) doesn't necessarily provide any visible improvements that I could tell.

Appreciate you being able to provide more understanding into the camera parameters. Those were the calculations I was trying to use before, but I could not (and am still having difficulty with) understand how this formulation leads to fx=1.7233965400390623, fy=2.2978620533854164, and xoffset=0.0082 for an iPhone (presumably X or 8S), as the baseline for either of the stereo iPhones is > 0.01m and not 0.0082m. Would love it if you could provide more clarity on this, so I can make sure the numbers are right on my end.

I am planning on making a PR soon which explains all the parameters that we can pass in and how to calculate them (or what they mean/refer to) in the README - so more clarity from your end will help me with this. Might be useful for others who have been digging into it for a while.

@snaves Yes, I did notice that in the stereo pair I attached, and other images taken with the same camera. Looks like a case of bad rectification that went unnoticed. I'll be trying again with another camera soon that should hopefully have better rectification. and will post the results here for your perusal.

mpi_example_0 1mindist

reyet commented

Here's another way of thinking of it. fx and fy are unit-free (because they are ratios) and correspond to field-of-view according to the formula fov_x = 2 * atan(0.5 / fx), and similarly for y. Plugging fx = 1.7234, fy = 2.2979 into that formula, we get fov_x = 32.5° and fov_y = 24.5° which is about right for an iPhone telephoto lens with a slight crop. Also, we can check that fy / fx = 4/3, the aspect ratio of the image.

I usually think of xoffset, min_depth and max_depth as all being in metres, and that's how I described them above, but this is just a convention: what's important is only that that they use the same units. So, specifying an xoffset of 0.0082 instead of, say, 0.015 is equivalent to specifying 0.015 but multiplying min_depth and max_depth by 0.015/0.082. It has the effect of pushing the near plane out a bit but otherwise everything will still work.

Does this help?

Hi @reyet, thanks this helps.

This was in line with my understanding (that fx, fy are unit-free), but it was unclear how the numbers for your sample came to be. With that being said, as our images are 16:9, passing in fx = 16, fy = 9 should yield the same results as any ratio of the two.

For xOffset your explanation definitely helps clear things up a bit

Thanks!

Closing this issue - thanks a lot @snaves and @reyet !