OPEN-AIR-SUN/mars

how to train a model with nerfacto using depth supervision

Closed this issue ยท 29 comments

Hi, I'd like to train a model from scratch using depth supervision generated from the monocular depth estimation model, and my cicai_config.py is like this, is it right? Thanks!

Sorry for the above confusing layout, and my cicai_configs.py is like this, thanks.

KITTI_Recon_NSG_Car_Depth = MethodSpecification(
    config=TrainerConfig(
        method_name="nsg-kitti-car-depth-recon",
        steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
        steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
        steps_per_save=STEPS_PER_SAVE,
        max_num_iterations=MAX_NUM_ITERATIONS,
        save_only_latest_checkpoint=False,
        mixed_precision=False,
        use_grad_scaler=True,
        log_gradients=True,
        pipeline=NSGPipelineConfig(
            datamanager=NSGkittiDataManagerConfig(
                dataparser=NSGkittiDataParserConfig(
                    use_car_latents=False,
                    use_depth=True,
                    split_setting="reconstruction",
                ),
                train_num_rays_per_batch=4096,
                eval_num_rays_per_batch=4096,
                camera_optimizer=CameraOptimizerConfig(mode="off"),
            ),
            model=SceneGraphModelConfig(
                background_model=NerfactoModelConfig(),
                object_model_template=NerfactoModelConfig(),
                object_representation="class-wise",
                object_ray_sample_strategy="remove-bg",
            ),
        ),
        optimizers={
            "background_model": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "learnable_global": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "object_model": {
                "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
        },
        # viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
        vis="wandb",
    ),
    description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)

If you want to use monocular depth estimation for KITTI, please add mono_depth_loss_mult in the SceneGraphModelConfig. You can also modify the parameters yourself.

CLICK ME

KITTI_Recon_NSG_Car_Depth = MethodSpecification(
    config=TrainerConfig(
        method_name="nsg-kitti-car-depth-recon",
        steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
        steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
        steps_per_save=STEPS_PER_SAVE,
        max_num_iterations=MAX_NUM_ITERATIONS,
        save_only_latest_checkpoint=False,
        mixed_precision=False,
        use_grad_scaler=True,
        log_gradients=True,
        pipeline=NSGPipelineConfig(
            datamanager=NSGkittiDataManagerConfig(
                dataparser=NSGkittiDataParserConfig(
                    scale_factor=0.01,
                    use_car_latents=False,
                    use_depth=True,
                    split_setting="reconstruction",
                ),
                train_num_rays_per_batch=4096,
                eval_num_rays_per_batch=4096,
                camera_optimizer=CameraOptimizerConfig(mode="off"),
            ),
            model=SceneGraphModelConfig(
                mono_depth_loss_mult=0.05,
                depth_loss_mult=0,
                background_model=NerfactoModelConfig(),
                object_model_template=NerfactoModelConfig(),
                object_representation="class-wise",
                object_ray_sample_strategy="remove-bg",
            ),
        ),
        optimizers={
            "background_model": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "learnable_global": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "object_model": {
                "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
        },
        # viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
        vis="wandb",
    ),
    description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)

Thanks for your quick reply, and I also have a question: when I use cicai_render.py to render images or videos, what should I modify if I only want to render the background (remove the objects)? Thanks!

Thanks for your quick reply, and I also have a question: when I use cicai_render.py to render images or videos, what should I modify if I only want to render the background (remove the objects)? Thanks!

Hi! You can refer to #33, maybe it's helpful to you.

Ok, btw, I generated depth maps using monocular depth estimation model, and I put the visualization images, which are 3-channel, into the completion_02 folder, is there any required processing before putting them in the folder? Thanks

Below is an example image that we generated with a monocular depth estimation model.
image

So your depth map is one-channel? Did you transfer the 3-channel depth map generated from the model to one channel?

But the depth map I generated is a color map, not black and white, should I transfer it into grayscale?

Hi! You can refer to the below code, which reads our depth from the image. And you need to transfer your image into grayscale.
image

zwlvd commented

I train the kitti 0006 without depth ,and the objects in the scene are just some shadow. Dose it mean without the depth the result will be bad .
00106

Hi! You can refer to the below code, which reads our depth from the image. And you need to transfer your image into grayscale. image

ok, thanks a lot, i will have a look.

I train the kitti 0006 without depth ,and the objects in the scene are just some shadow. Dose it mean without the depth the result will be bad . 00106

You can try with our proposed category-level car model. That will help to decouple the object and background.

Dose it mean without the depth the result will be bad

Sure.

zwlvd commented

I train the kitti 0006 without depth ,and the objects in the scene are just some shadow. Dose it mean without the depth the result will be bad . 00106

You can try with our proposed category-level car model. That will help to decouple the object and background.

Dose it mean without the depth the result will be bad

Sure.

Thank you for your reply. Is the below model the category-level car model ?

model=SceneGraphModelConfig(
background_model=NerfactoModelConfig(),
object_model_template=CarNeRFModelConfig(_target=CarNeRF),
object_representation="class-wise",
object_ray_sample_strategy="remove-bg",
),

zwlvd commented

Below is an example image that we generated with a monocular depth estimation model. image

I follow the omnidata to generate the depth ,and I notice the channel is setted to be 1 .While the result I got is still 3-channel.
000151
What's the problem?

CLICK ME

> > > I train the kitti 0006 without depth ,and the objects in the scene are just some shadow. Dose it mean without the depth the result will be bad . ![00106](https://user-images.githubusercontent.com/74552396/259690764-8110223b-a09b-4d3d-8293-ba115160a25e.png) > > > > > > You can try with our proposed category-level car model. That will help to decouple the object and background. > > > Dose it mean without the depth the result will be bad > > > > > > Sure. > > Thank you for your reply. Is the below model the category-level car model ? > > > model=SceneGraphModelConfig( > > background_model=NerfactoModelConfig(), > > object_model_template=CarNeRFModelConfig(_target=CarNeRF), > > object_representation="class-wise", > > object_ray_sample_strategy="remove-bg", > > ),

Sure.

@zwlvd
Hi. Thanks for your reply. You can change as the below image and try again.
image

For more information about KITTI depth maps, you all can refer to #18.

zwlvd commented

For more information about KITTI depth maps, you all can refer to #18.

Thank you for your valuable suggestions, it's very useful.

Hi, I trained the model using monocular depth, but the depth loss is like this, which didn't decrease stably, and the eval depth image is like the following. Could you please point out what the problem may be? Thanks.
c05fcabbee9d5ee79c3d67e53
91eb880b0baa6c186ba4b7fb9

Hi, I trained the model using monocular depth, but the depth loss is like this, which didn't decrease stably, and the eval depth image is like the following. Could you please point out what the problem may be? Thanks. c05fcabbee9d5ee79c3d67e53 91eb880b0baa6c186ba4b7fb9

Hi! What multipliers do you apply to the mono_depth_loss and the general depth_loss?

Hi, I trained the model using monocular depth, but the depth loss is like this, which didn't decrease stably, and the eval depth image is like the following. Could you please point out what the problem may be? Thanks. c05fcabbee9d5ee79c3d67e53 91eb880b0baa6c186ba4b7fb9

Hi! What multipliers do you apply to the mono_depth_loss and the general depth_loss?

I trained the model using this config.

KITTI_Recon_NSG_Car_Depth = MethodSpecification(
    config=TrainerConfig(
        method_name="nsg-kitti-car-depth-recon",
        steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
        steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
        steps_per_save=STEPS_PER_SAVE,
        max_num_iterations=MAX_NUM_ITERATIONS,
        save_only_latest_checkpoint=False,
        mixed_precision=False,
        use_grad_scaler=True,
        log_gradients=True,
        pipeline=NSGPipelineConfig(
            datamanager=NSGkittiDataManagerConfig(
                dataparser=NSGkittiDataParserConfig(
                    scale_factor=0.01,
                    use_car_latents=False,
                    use_depth=True,
                    split_setting="reconstruction",
                ),
                train_num_rays_per_batch=4096,
                eval_num_rays_per_batch=4096,
                camera_optimizer=CameraOptimizerConfig(mode="off"),
            ),
            model=SceneGraphModelConfig(
                mono_depth_loss_mult=0.05,
                depth_loss_mult=0,
                background_model=NerfactoModelConfig(),
                object_model_template=NerfactoModelConfig(),
                object_representation="class-wise",
                object_ray_sample_strategy="remove-bg",
            ),
        ),
        optimizers={
            "background_model": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "learnable_global": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "object_model": {
                "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
        },
        # viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
        vis="wandb",
    ),
    description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)

Hi, I trained the model using monocular depth, but the depth loss is like this, which didn't decrease stably, and the eval depth image is like the following. Could you please point out what the problem may be? Thanks. c05fcabbee9d5ee79c3d67e53 91eb880b0baa6c186ba4b7fb9

Hi! What multipliers do you apply to the mono_depth_loss and the general depth_loss?

I trained the model using this config.

KITTI_Recon_NSG_Car_Depth = MethodSpecification(
    config=TrainerConfig(
        method_name="nsg-kitti-car-depth-recon",
        steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
        steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
        steps_per_save=STEPS_PER_SAVE,
        max_num_iterations=MAX_NUM_ITERATIONS,
        save_only_latest_checkpoint=False,
        mixed_precision=False,
        use_grad_scaler=True,
        log_gradients=True,
        pipeline=NSGPipelineConfig(
            datamanager=NSGkittiDataManagerConfig(
                dataparser=NSGkittiDataParserConfig(
                    scale_factor=0.01,
                    use_car_latents=False,
                    use_depth=True,
                    split_setting="reconstruction",
                ),
                train_num_rays_per_batch=4096,
                eval_num_rays_per_batch=4096,
                camera_optimizer=CameraOptimizerConfig(mode="off"),
            ),
            model=SceneGraphModelConfig(
                mono_depth_loss_mult=0.05,
                depth_loss_mult=0,
                background_model=NerfactoModelConfig(),
                object_model_template=NerfactoModelConfig(),
                object_representation="class-wise",
                object_ray_sample_strategy="remove-bg",
            ),
        ),
        optimizers={
            "background_model": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "learnable_global": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "object_model": {
                "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
        },
        # viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
        vis="wandb",
    ),
    description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)

Hi, do you have any suggestion about this problem, I'm a bit confused. When I used monocular depth loss, the training result is even worse than without depth supervision ... Thanks in advance.

Hi, we think there's a visualization problem with the depth colormap. Could you please check the values of the predicted depths?

Hi, we think there's a visualization problem with the depth colormap. Could you please check the values of the predicted depths?

Hi, I checked the values of the predicted depths. I generated the depth map following these codes, and the generated map is a 3-channel black-and-white picture and the pixel value is between 0-255. Is it right?
depth_gen

Hi, I saw this part in the code, does this mean the depth image should be a one-channel image? And what range of the pixel values should be? Thanks.
ๅ›พ็‰‡

Hi, I saw this part in the code, does this mean the depth image should be a one-channel image? And what range of the pixel values should be? Thanks. ๅ›พ็‰‡

whatever format the depth maps you load will be transformed into a single-channel float tensor

Hi, I saw this part in the code, does this mean the depth image should be a one-channel image? And what range of the pixel values should be? Thanks. ๅ›พ็‰‡

whatever format the depth maps you load will be transformed into a single-channel float tensor

Ok, thanks a lot

2023-09-01-10:54
@sonnefred Hi, I loaded the depth map as a single-channel float tensor, but I still have the same problem, the mono depth loss won't go down, do you have any solution for this?

Below is an example image that we generated with a monocular depth estimation model. image

I follow the omnidata to generate the depth ,and I notice the channel is setted to be 1 .While the result I got is still 3-channel. 000151 What's the problem?

Same case here. My depth maps are 3-channel as well. Should you change the code anywhere or can you start training with depth maps being 3-channel?