camera intrinsics for the initial image

Thanks for your greak work and kind reponses!

In the paper, you mentioned that "we produce an initial mesh by generating an image from text, and backproject it into 3D using a depth estimation model."
However, to my knowledge, to backproject an image and a corresponding depth map into 3D point clouds, camera intrinsics are required. I would like to know how backprojection without camera intrinsics information was possible.

Thank you for your help!

Hi,

we manually define the intrinsic through the fov parameter. For the first image, this is a random choice, but for all subsequent images it is the correct value.

"For the first image, this is a random choice" -> Does it mean that for the first image generated by the inpainting model, camera instinsics is optional to generate initial mesh? I am so sorry. I am new to this field.

No they are not optional, we still need to use them. We just randomly define the fov and assume the principal point at the center of the image. Then we build the intrinsics from this choice and use them for all subsequent steps. You can see this here:

text2room/model/text2room_pipeline.py

Line 76 in c38d97e

    
           self.K = get_pinhole_intrinsics_from_fov(H=self.H, W=self.W, fov_in_degrees=self.args.fov).to(self.world_to_cam)

Thank you so much for your answer!