how to train a model with nerfacto using depth supervision

Question

how to train a model with nerfacto using depth supervision

Closed this issue a year ago · 29 comments

Hi, I'd like to train a model from scratch using depth supervision generated from the monocular depth estimation model, and my cicai_config.py is like this, is it right? Thanks!

Answer 1 · 2023-08-10T06:56:05.000Z

Sorry for the above confusing layout, and my cicai_configs.py is like this, thanks.

KITTI_Recon_NSG_Car_Depth = MethodSpecification(
    config=TrainerConfig(
        method_name="nsg-kitti-car-depth-recon",
        steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
        steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
        steps_per_save=STEPS_PER_SAVE,
        max_num_iterations=MAX_NUM_ITERATIONS,
        save_only_latest_checkpoint=False,
        mixed_precision=False,
        use_grad_scaler=True,
        log_gradients=True,
        pipeline=NSGPipelineConfig(
            datamanager=NSGkittiDataManagerConfig(
                dataparser=NSGkittiDataParserConfig(
                    use_car_latents=False,
                    use_depth=True,
                    split_setting="reconstruction",
                ),
                train_num_rays_per_batch=4096,
                eval_num_rays_per_batch=4096,
                camera_optimizer=CameraOptimizerConfig(mode="off"),
            ),
            model=SceneGraphModelConfig(
                background_model=NerfactoModelConfig(),
                object_model_template=NerfactoModelConfig(),
                object_representation="class-wise",
                object_ray_sample_strategy="remove-bg",
            ),
        ),
        optimizers={
            "background_model": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "learnable_global": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "object_model": {
                "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
        },
        # viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
        vis="wandb",
    ),
    description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)

Answer 2 · 2023-08-10T07:17:51.000Z

If you want to use monocular depth estimation for KITTI, please add mono_depth_loss_mult in the SceneGraphModelConfig. You can also modify the parameters yourself.

CLICK ME

KITTI_Recon_NSG_Car_Depth = MethodSpecification(
    config=TrainerConfig(
        method_name="nsg-kitti-car-depth-recon",
        steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
        steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
        steps_per_save=STEPS_PER_SAVE,
        max_num_iterations=MAX_NUM_ITERATIONS,
        save_only_latest_checkpoint=False,
        mixed_precision=False,
        use_grad_scaler=True,
        log_gradients=True,
        pipeline=NSGPipelineConfig(
            datamanager=NSGkittiDataManagerConfig(
                dataparser=NSGkittiDataParserConfig(
                    scale_factor=0.01,
                    use_car_latents=False,
                    use_depth=True,
                    split_setting="reconstruction",
                ),
                train_num_rays_per_batch=4096,
                eval_num_rays_per_batch=4096,
                camera_optimizer=CameraOptimizerConfig(mode="off"),
            ),
            model=SceneGraphModelConfig(
                mono_depth_loss_mult=0.05,
                depth_loss_mult=0,
                background_model=NerfactoModelConfig(),
                object_model_template=NerfactoModelConfig(),
                object_representation="class-wise",
                object_ray_sample_strategy="remove-bg",
            ),
        ),
        optimizers={
            "background_model": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "learnable_global": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "object_model": {
                "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
        },
        # viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
        vis="wandb",
    ),
    description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)

Answer 3 · 2023-08-10T07:38:40.000Z

Thanks for your quick reply, and I also have a question: when I use cicai_render.py to render images or videos, what should I modify if I only want to render the background (remove the objects)? Thanks!

Answer 4 · 2023-08-10T07:42:40.000Z

Thanks for your quick reply, and I also have a question: when I use cicai_render.py to render images or videos, what should I modify if I only want to render the background (remove the objects)? Thanks!

Hi! You can refer to #33, maybe it's helpful to you.

Answer 5 · 2023-08-10T08:07:49.000Z

Ok, btw, I generated depth maps using monocular depth estimation model, and I put the visualization images, which are 3-channel, into the completion_02 folder, is there any required processing before putting them in the folder? Thanks

Answer 6 · 2023-08-10T08:19:32.000Z

Below is an example image that we generated with a monocular depth estimation model.

Answer 7 · 2023-08-10T08:32:03.000Z

So your depth map is one-channel? Did you transfer the 3-channel depth map generated from the model to one channel?

Answer 8 · 2023-08-10T08:52:48.000Z

But the depth map I generated is a color map, not black and white, should I transfer it into grayscale?

Answer 9 · 2023-08-10T09:14:04.000Z

Hi! You can refer to the below code, which reads our depth from the image. And you need to transfer your image into grayscale.

Answer 10 · 2023-08-10T09:16:57.000Z

I train the kitti 0006 without depth ,and the objects in the scene are just some shadow. Dose it mean without the depth the result will be bad .

Answer 11 · 2023-08-10T09:23:33.000Z

Hi! You can refer to the below code, which reads our depth from the image. And you need to transfer your image into grayscale.

ok, thanks a lot, i will have a look.

Answer 12 · 2023-08-10T09:24:10.000Z

I train the kitti 0006 without depth ,and the objects in the scene are just some shadow. Dose it mean without the depth the result will be bad .

You can try with our proposed category-level car model. That will help to decouple the object and background.

Dose it mean without the depth the result will be bad

Sure.

Answer 13 · 2023-08-10T10:46:18.000Z

I train the kitti 0006 without depth ,and the objects in the scene are just some shadow. Dose it mean without the depth the result will be bad .

You can try with our proposed category-level car model. That will help to decouple the object and background.

Dose it mean without the depth the result will be bad

Sure.

Thank you for your reply. Is the below model the category-level car model ?

model=SceneGraphModelConfig(
background_model=NerfactoModelConfig(),
object_model_template=CarNeRFModelConfig(_target=CarNeRF),
object_representation="class-wise",
object_ray_sample_strategy="remove-bg",
),

Answer 14 · 2023-08-10T11:23:13.000Z

Below is an example image that we generated with a monocular depth estimation model.

I follow the omnidata to generate the depth ,and I notice the channel is setted to be 1 .While the result I got is still 3-channel.

What's the problem?

Answer 15 · 2023-08-10T11:42:33.000Z

CLICK ME

> > > I train the kitti 0006 without depth ,and the objects in the scene are just some shadow. Dose it mean without the depth the result will be bad . ![00106](https://user-images.githubusercontent.com/74552396/259690764-8110223b-a09b-4d3d-8293-ba115160a25e.png) > > > > > > You can try with our proposed category-level car model. That will help to decouple the object and background. > > > Dose it mean without the depth the result will be bad > > > > > > Sure. > > Thank you for your reply. Is the below model the category-level car model ? > > > model=SceneGraphModelConfig( > > background_model=NerfactoModelConfig(), > > object_model_template=CarNeRFModelConfig(_target=CarNeRF), > > object_representation="class-wise", > > object_ray_sample_strategy="remove-bg", > > ),

Sure.

Answer 16 · 2023-08-10T11:44:28.000Z

@zwlvd
Hi. Thanks for your reply. You can change as the below image and try again.

Answer 17 · 2023-08-10T11:46:05.000Z

For more information about KITTI depth maps, you all can refer to #18.

Answer 18 · 2023-08-10T11:50:11.000Z

For more information about KITTI depth maps, you all can refer to #18.

Thank you for your valuable suggestions, it's very useful.

Answer 19 · 2023-08-11T02:53:57.000Z

Hi, I trained the model using monocular depth, but the depth loss is like this, which didn't decrease stably, and the eval depth image is like the following. Could you please point out what the problem may be? Thanks.

Answer 20 · 2023-08-11T04:37:01.000Z

Hi, I trained the model using monocular depth, but the depth loss is like this, which didn't decrease stably, and the eval depth image is like the following. Could you please point out what the problem may be? Thanks.

Hi! What multipliers do you apply to the mono_depth_loss and the general depth_loss?

Answer 21 · 2023-08-11T05:12:23.000Z

Hi, I trained the model using monocular depth, but the depth loss is like this, which didn't decrease stably, and the eval depth image is like the following. Could you please point out what the problem may be? Thanks.

Hi! What multipliers do you apply to the mono_depth_loss and the general depth_loss?

I trained the model using this config.

KITTI_Recon_NSG_Car_Depth = MethodSpecification(
    config=TrainerConfig(
        method_name="nsg-kitti-car-depth-recon",
        steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
        steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
        steps_per_save=STEPS_PER_SAVE,
        max_num_iterations=MAX_NUM_ITERATIONS,
        save_only_latest_checkpoint=False,
        mixed_precision=False,
        use_grad_scaler=True,
        log_gradients=True,
        pipeline=NSGPipelineConfig(
            datamanager=NSGkittiDataManagerConfig(
                dataparser=NSGkittiDataParserConfig(
                    scale_factor=0.01,
                    use_car_latents=False,
                    use_depth=True,
                    split_setting="reconstruction",
                ),
                train_num_rays_per_batch=4096,
                eval_num_rays_per_batch=4096,
                camera_optimizer=CameraOptimizerConfig(mode="off"),
            ),
            model=SceneGraphModelConfig(
                mono_depth_loss_mult=0.05,
                depth_loss_mult=0,
                background_model=NerfactoModelConfig(),
                object_model_template=NerfactoModelConfig(),
                object_representation="class-wise",
                object_ray_sample_strategy="remove-bg",
            ),
        ),
        optimizers={
            "background_model": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "learnable_global": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "object_model": {
                "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
        },
        # viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
        vis="wandb",
    ),
    description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)

Answer 22 · 2023-08-12T07:27:09.000Z

Hi, I trained the model using monocular depth, but the depth loss is like this, which didn't decrease stably, and the eval depth image is like the following. Could you please point out what the problem may be? Thanks.

Hi! What multipliers do you apply to the mono_depth_loss and the general depth_loss?

I trained the model using this config.

KITTI_Recon_NSG_Car_Depth = MethodSpecification(
    config=TrainerConfig(
        method_name="nsg-kitti-car-depth-recon",
        steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
        steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
        steps_per_save=STEPS_PER_SAVE,
        max_num_iterations=MAX_NUM_ITERATIONS,
        save_only_latest_checkpoint=False,
        mixed_precision=False,
        use_grad_scaler=True,
        log_gradients=True,
        pipeline=NSGPipelineConfig(
            datamanager=NSGkittiDataManagerConfig(
                dataparser=NSGkittiDataParserConfig(
                    scale_factor=0.01,
                    use_car_latents=False,
                    use_depth=True,
                    split_setting="reconstruction",
                ),
                train_num_rays_per_batch=4096,
                eval_num_rays_per_batch=4096,
                camera_optimizer=CameraOptimizerConfig(mode="off"),
            ),
            model=SceneGraphModelConfig(
                mono_depth_loss_mult=0.05,
                depth_loss_mult=0,
                background_model=NerfactoModelConfig(),
                object_model_template=NerfactoModelConfig(),
                object_representation="class-wise",
                object_ray_sample_strategy="remove-bg",
            ),
        ),
        optimizers={
            "background_model": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "learnable_global": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "object_model": {
                "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
        },
        # viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
        vis="wandb",
    ),
    description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)

Hi, do you have any suggestion about this problem, I'm a bit confused. When I used monocular depth loss, the training result is even worse than without depth supervision ... Thanks in advance.

Answer 23 · 2023-08-13T05:37:48.000Z

Hi, we think there's a visualization problem with the depth colormap. Could you please check the values of the predicted depths?

Answer 24 · 2023-08-13T11:58:23.000Z

Hi, we think there's a visualization problem with the depth colormap. Could you please check the values of the predicted depths?

Hi, I checked the values of the predicted depths. I generated the depth map following these codes, and the generated map is a 3-channel black-and-white picture and the pixel value is between 0-255. Is it right?

Answer 25 · 2023-08-15T03:07:57.000Z

Hi, I saw this part in the code, does this mean the depth image should be a one-channel image? And what range of the pixel values should be? Thanks.

Answer 26 · 2023-08-16T04:40:38.000Z

Hi, I saw this part in the code, does this mean the depth image should be a one-channel image? And what range of the pixel values should be? Thanks.

whatever format the depth maps you load will be transformed into a single-channel float tensor

Answer 27 · 2023-08-16T12:41:45.000Z

Hi, I saw this part in the code, does this mean the depth image should be a one-channel image? And what range of the pixel values should be? Thanks.

whatever format the depth maps you load will be transformed into a single-channel float tensor

Ok, thanks a lot

Answer 28 · 2023-09-01T03:04:49.000Z

@sonnefred Hi, I loaded the depth map as a single-channel float tensor, but I still have the same problem, the mono depth loss won't go down, do you have any solution for this?

Answer 29 · 2024-03-19T05:20:09.000Z

Below is an example image that we generated with a monocular depth estimation model.

I follow the omnidata to generate the depth ,and I notice the channel is setted to be 1 .While the result I got is still 3-channel. What's the problem?

Same case here. My depth maps are 3-channel as well. Should you change the code anywhere or can you start training with depth maps being 3-channel?