How normalize the coordinate to [-1, 1]?
guker opened this issue · 4 comments
def make_projection_matrix(z_ref, intrinsics, height, width):
"""Build a matrix that projects from camera space into clip space.
Args:
z_ref (float): The reference depth (will become z=0).
intrinsics (CameraIntrinsics): The camera object specifying focal length and optical centre.
height (float): The image height.
width (float): The image width.
Returns:
torch.Tensor: The projection matrix.
"""
# Set the z-size (depth) of the viewing frustum to be equal to the
# size of the portion of the XY plane at z_ref which projects
# onto the image.
size = z_ref * max(width / intrinsics.alpha_x, height / intrinsics.alpha_y)
# Set near and far planes such that:
# a) z_ref will correspond to z=0 after normalisation
# zref=2fn/(f+n)
# b) The distance from z=-1 to z=1 (normalised) will correspond
# to `size` in camera space
# f−n=size
far = 0.5 * (sqrt(z_ref ** 2 + size ** 2) + z_ref - size)
near = 0.5 * (sqrt(z_ref ** 2 + size ** 2) + z_ref + size)
# Construct the perspective projection matrix.
# More details: http://kgeorge.github.io/2014/03/08/calculating-opengl-perspective-matrix-from-opencv-intrinsic-matrix
m_proj = intrinsics.matrix.new([
[intrinsics.alpha_x / intrinsics.x_0, 0, 0, 0],
[0, intrinsics.alpha_y / intrinsics.y_0, 0, 0],
[0, 0, -(far + near) / (far - near), 2 * far * near / (far - near)],
[0, 0, 1, 0],
])
return m_proj
what is the meaning of size in code?
We need to specify where the near and far planes of the frustum are (i.e. where z=-1 and z=1 are in clip space). size
effectively controls the separation between the z=-1 and z=1 planes (size
is that distance in camera space). We calculate size
using the width/height in order to obtain a square-ish frustum. z=0 corresponds to z_ref in camera space.
got it, thanks.
Hey anibali,
I thought about opening a new issue request but I think it is more appropriate to open this issue again.
May I ask what purpose does this line sqrt(z_ref ** 2 + size ** 2)
serve ? I can see there is a triangle somewhere but I'm not sure why is it there. wouldn't 0.5 * (z_ref +/- size)
get the job done intuitively? All I know the sqrt(z_ref ** 2 + size ** 2)
is important for z_ref=2fn/(f+n)
to be equal.
Also, I don't know why the far side is z_ref - size
and the near frustum side is z_ref + size
, shouldn't it be the opposite? or is the z-axis is negative from the camera point of view? because when I calculate far - near
it appears to be equal to -size not size.
There are lots of different ways in which you could define a normalised space if you wanted. However, I decided to adhere to two properties to define mine: the "depth" of the space is based on the width/height (making it a cube assuming those two dimensions are equal), and z=0 in normalised space corresponds to the reference z in camera space. The "flipped" z-axis was a result of wanting a right-handed coordinate system, if I recall correctly.