OPEN-AIR-SUN/mars

what is the camera_optimizer? & object pose debugging

Closed this issue · 22 comments

hello, when I use the kitti dataset train mars, I met the problem. When I remove the camera_optimizer in the config, I found the result is normal? Therefore, the module is necessary?
Thanks a lot.

To my understanding, the module is disabled by default in their work.

Thank you. I want to ask another questions. Did you meet the problem of background blur in private dataset, and can not separate the dynamic objects and background? I didn't using depth and semantic mask.

I met similar problem, caused by wrong bbox position(wrong coordinate). Maybe you can check if your bbox pos is correct.

@Nplace-su Thank you for your reply. Can you tell me where to check the bbox pose, and what is the process for the bbox check? through checking the visible_objects_ls by visualivation?

image
image

From the train results, the bbox pose seems right? but the result of objects_rgb is very strange. could you provide me some help, very grateful.

@szhang963 It seems not 100% right? There are lots of potential issues when you use your own data, I suggest you make sure your coordinate and other data conventions 100% align with their original dataparsers.
BTW, could you share your training config?

Thank you for your patient reply. This is my config.yaml:

!!python/object:nerfstudio.engine.trainer.TrainerConfig
_target: !!python/name:nerfstudio.engine.trainer.Trainer ''
data: &id003 !!python/object/apply:pathlib.PosixPath
- data
- my_kitti
- training
- image_02
- '0001'
experiment_name: KITTI_my_Recon_Mars_focal_cxy_boxscal_camdebug_noscale-001
gradient_accumulation_steps: 1
load_checkpoint: null
load_config: null
load_dir: null
load_scheduler: true
load_step: null
log_gradients: true
logging: !!python/object:nerfstudio.configs.base_config.LoggingConfig
  local_writer: !!python/object:nerfstudio.configs.base_config.LocalWriterConfig
    _target: !!python/name:nerfstudio.utils.writer.LocalWriter ''
    enable: true
    max_log_size: 10
    stats_to_track: !!python/tuple
    - !!python/object/apply:nerfstudio.utils.writer.EventName
      - Train Iter (time)
    - !!python/object/apply:nerfstudio.utils.writer.EventName
      - Train Rays / Sec
    - !!python/object/apply:nerfstudio.utils.writer.EventName
      - Test PSNR
    - !!python/object/apply:nerfstudio.utils.writer.EventName
      - Vis Rays / Sec
    - !!python/object/apply:nerfstudio.utils.writer.EventName
      - Test Rays / Sec
    - !!python/object/apply:nerfstudio.utils.writer.EventName
      - ETA (time)
  max_buffer_size: 20
  profiler: basic
  relative_log_dir: !!python/object/apply:pathlib.PosixPath []
  steps_per_log: 10
machine: !!python/object:nerfstudio.configs.base_config.MachineConfig
  device_type: cuda
  dist_url: auto
  machine_rank: 0
  num_devices: 1
  num_machines: 1
  seed: 42
max_num_iterations: 100000
method_name: KITTI_my_Recon_Mars_focal_cxy_boxscal_camdebug_noscale
mixed_precision: false
optimizers:
  background_model:
    optimizer: !!python/object:nerfstudio.engine.optimizers.RAdamOptimizerConfig
      _target: &id001 !!python/name:torch.optim.radam.RAdam ''
      eps: 1.0e-15
      lr: 0.001
      max_norm: null
      weight_decay: 0
    scheduler: !!python/object:nerfstudio.engine.schedulers.ExponentialDecaySchedulerConfig
      _target: &id002 !!python/name:nerfstudio.engine.schedulers.ExponentialDecayScheduler ''
      lr_final: 1.0e-05
      lr_pre_warmup: 1.0e-08
      max_steps: 200000
      ramp: cosine
      warmup_steps: 0
  learnable_global:
    optimizer: !!python/object:nerfstudio.engine.optimizers.RAdamOptimizerConfig
      _target: *id001
      eps: 1.0e-15
      lr: 0.001
      max_norm: null
      weight_decay: 0
    scheduler: !!python/object:nerfstudio.engine.schedulers.ExponentialDecaySchedulerConfig
      _target: *id002
      lr_final: 1.0e-05
      lr_pre_warmup: 1.0e-08
      max_steps: 200000
      ramp: cosine
      warmup_steps: 0
  object_model:
    optimizer: !!python/object:nerfstudio.engine.optimizers.RAdamOptimizerConfig
      _target: *id001
      eps: 1.0e-15
      lr: 0.005
      max_norm: null
      weight_decay: 0
    scheduler: !!python/object:nerfstudio.engine.schedulers.ExponentialDecaySchedulerConfig
      _target: *id002
      lr_final: 1.0e-05
      lr_pre_warmup: 1.0e-08
      max_steps: 200000
      ramp: cosine
      warmup_steps: 0
output_dir: !!python/object/apply:pathlib.PosixPath
- work_dirs
pipeline: !!python/object:mars.mars_pipeline.MarsPipelineConfig
  _target: !!python/name:mars.mars_pipeline.MarsPipeline ''
  datamanager: !!python/object:mars.data.mars_datamanager.MarsDataManagerConfig
    _target: !!python/name:mars.data.mars_datamanager.MarsDataManager ''
    camera_optimizer: null
    camera_res_scale_factor: 1.0
    collate_fn: !!python/name:nerfstudio.data.utils.nerfstudio_collate.nerfstudio_collate ''
    data: *id003
    dataparser: !!python/object:mars.data.mars_kitti_dataparser_phi.MarsKittiDataParserConfig
      _target: !!python/name:mars.data.mars_kitti_dataparser_phi.MarsKittiParser ''
      add_input_rows: -1
      alpha_color: white
      bckg_only: false
      box_scale: 1.0
      car_nerf_state_dict_path: !!python/object/apply:pathlib.PosixPath
      - pretrain
      - car_nerf
      - car_nerf.ckpt
      car_object_latents_path: !!python/object/apply:pathlib.PosixPath
      - pretrain
      - car_nerf
      - latent_codes.pt
      chunk: 32768
      data: !!python/object/apply:pathlib.PosixPath
      - data
      - kitti
      - training
      - image_02
      - '0006'
      dataset_type: kitti
      far_plane: 150.0
      first_frame: 0
      last_frame: 50
      max_input_objects: -1
      near_plane: 0.5
      netchunk: 65536
      novel_view: left
      obj_only: false
      obj_opaque: true
      object_setting: 0
      render_only: false
      scale_factor: 1
      scene_scale: 1.0
      semantic_mask_classes: []
      semantic_path: !!python/object/apply:pathlib.PosixPath []
      split_setting: reconstruction
      use_car_latents: false
      use_depth: false
      use_obj: true
      use_object_properties: true
      use_semantic: false
    eval_image_indices: !!python/tuple
    - 0
    eval_num_images_to_sample_from: -1
    eval_num_rays_per_batch: 8192
    eval_num_times_to_repeat_images: -1
    images_on_gpu: false
    masks_on_gpu: false
    patch_size: 1
    pixel_sampler: !!python/object:nerfstudio.data.pixel_samplers.PixelSamplerConfig
      _target: !!python/name:nerfstudio.data.pixel_samplers.PixelSampler ''
      is_equirectangular: false
      keep_full_image: false
      num_rays_per_batch: 4096
    train_num_images_to_sample_from: -1
    train_num_rays_per_batch: 8192
    train_num_times_to_repeat_images: -1
  model: !!python/object:mars.models.scene_graph.SceneGraphModelConfig
    _target: !!python/name:mars.models.scene_graph.SceneGraphModel ''
    background_color: black
    background_model: !!python/object:mars.models.nerfacto.NerfactoModelConfig
      _target: &id004 !!python/name:mars.models.nerfacto.NerfactoModel ''
      appearance_embed_dim: 32
      background_color: black
      base_res: 16
      collider_params:
        far_plane: 6.0
        near_plane: 2.0
      disable_scene_contraction: false
      distortion_loss_mult: 0.002
      enable_collider: true
      eval_num_rays_per_chunk: 4096
      far_plane: 150.0
      features_per_level: 2
      hidden_dim: 64
      hidden_dim_color: 64
      hidden_dim_transient: 64
      implementation: tcnn
      interlevel_loss_mult: 1.0
      log2_hashmap_size: 19
      loss_coefficients:
        rgb_loss_coarse: 1.0
        rgb_loss_fine: 1.0
      max_res: 2048
      near_plane: 0.05
      num_levels: 16
      num_nerf_samples_per_ray: 97
      num_proposal_iterations: 2
      num_proposal_samples_per_ray: &id005 !!python/tuple
      - 256
      - 128
      obj_feat_dim: 0
      orientation_loss_mult: 0.0001
      pred_normal_loss_mult: 0.001
      predict_normals: false
      prompt: null
      proposal_initial_sampler: piecewise
      proposal_net_args_list:
      - hidden_dim: 16
        log2_hashmap_size: 17
        max_res: 128
        num_levels: 5
        use_linear: false
      - hidden_dim: 16
        log2_hashmap_size: 17
        max_res: 256
        num_levels: 5
        use_linear: false
      proposal_update_every: 5
      proposal_warmup: 5000
      proposal_weights_anneal_max_num_iters: 1000
      proposal_weights_anneal_slope: 10.0
      use_average_appearance_embedding: true
      use_gradient_scaling: false
      use_proposal_weight_anneal: true
      use_same_proposal_network: false
      use_single_jitter: true
    collider_params:
      far_plane: 6.0
      near_plane: 2.0
    debug_object_pose: false
    depth_loss_mult: 0.01
    depth_loss_type: !!python/object/apply:nerfstudio.model_components.losses.DepthLossType
    - 1
    depth_sigma: 0.05
    enable_collider: true
    eval_num_rays_per_chunk: 4096
    far_plane: 1000.0
    interlevel_loss_mult: 1.0
    is_euclidean_depth: false
    latent_size: 256
    loss_coefficients:
      rgb_loss_coarse: 1.0
      rgb_loss_fine: 1.0
    max_num_obj: -1
    mono_depth_loss_mult: 0.0
    near_plane: 0.05
    object_model_template: !!python/object:mars.models.nerfacto.NerfactoModelConfig
      _target: *id004
      appearance_embed_dim: 32
      background_color: black
      base_res: 16
      collider_params:
        far_plane: 6.0
        near_plane: 2.0
      disable_scene_contraction: false
      distortion_loss_mult: 0.002
      enable_collider: true
      eval_num_rays_per_chunk: 4096
      far_plane: 150.0
      features_per_level: 2
      hidden_dim: 64
      hidden_dim_color: 64
      hidden_dim_transient: 64
      implementation: tcnn
      interlevel_loss_mult: 1.0
      log2_hashmap_size: 19
      loss_coefficients:
        rgb_loss_coarse: 1.0
        rgb_loss_fine: 1.0
      max_res: 2048
      near_plane: 0.05
      num_levels: 16
      num_nerf_samples_per_ray: 97
      num_proposal_iterations: 2
      num_proposal_samples_per_ray: *id005
      obj_feat_dim: 0
      orientation_loss_mult: 0.0001
      pred_normal_loss_mult: 0.001
      predict_normals: false
      prompt: null
      proposal_initial_sampler: piecewise
      proposal_net_args_list:
      - hidden_dim: 16
        log2_hashmap_size: 17
        max_res: 128
        num_levels: 5
        use_linear: false
      - hidden_dim: 16
        log2_hashmap_size: 17
        max_res: 256
        num_levels: 5
        use_linear: false
      proposal_update_every: 5
      proposal_warmup: 5000
      proposal_weights_anneal_max_num_iters: 1000
      proposal_weights_anneal_slope: 10.0
      use_average_appearance_embedding: true
      use_gradient_scaling: false
      use_proposal_weight_anneal: true
      use_same_proposal_network: false
      use_single_jitter: true
    object_ray_sample_strategy: remove-bg
    object_representation: class-wise
    object_warmup_steps: 1000
    orientation_loss_mult: 0.0001
    pred_normal_loss_mult: 0.001
    predict_normals: false
    prompt: null
    ray_add_input_rows: -1
    semantic_loss_mult: 1.0
    should_decay_sigma: false
    sigma_decay_rate: 0.9998
    sky_model: !!python/object:mars.models.sky_model.SkyModelConfig
      _target: !!python/name:mars.models.sky_model.SkyModel ''
      collider_params:
        far_plane: 6.0
        near_plane: 2.0
      enable_collider: true
      eval_num_rays_per_chunk: 4096
      hidden_dim: 128
      loss_coefficients:
        rgb_loss_coarse: 1.0
        rgb_loss_fine: 1.0
      num_layers: 5
      prompt: null
    starting_depth_sigma: 4.0
    use_interlevel_loss: true
    use_sky_model: false
project_name: nerfstudio-project
prompt: null
relative_model_dir: !!python/object/apply:pathlib.PosixPath
- nerfstudio_models
save_only_latest_checkpoint: false
steps_per_eval_all_images: 5000
steps_per_eval_batch: 500
steps_per_eval_image: 500
steps_per_save: 10000
timestamp: 2023-12-24_200822
use_grad_scaler: true
viewer: !!python/object:nerfstudio.configs.base_config.ViewerConfig
  camera_frustum_scale: 0.1
  default_composite_depth: true
  image_format: jpeg
  jpeg_quality: 90
  make_share_url: false
  max_num_display_images: 512
  num_rays_per_chunk: 32768
  quit_on_train_completion: false
  relative_log_filename: viewer_log_filename.txt
  websocket_host: 0.0.0.0
  websocket_port: null
  websocket_port_default: 7007
vis: wandb

I convert my data to kitti format, and it is right and unshifted for projecting 3d bbox onto image. So, I am confused for the align in train result. Thank you for your help.

here's a tool for visualizing your camera & obj poses that may be helpful: https://github.com/wuzirui/mars_pose_visualizer/
you wanna make sure your camera & obj coord axes is the same as the KITTI/VKITTI's.

p.s. if you have multiple issues (next time), please raise them separately, so that others can refer :)

@wuzirui Than you a lot. I am trying the check of camera & obj coord axes using the visualizer. Sorry for asking another problem in the same issue, and I will be careful for it.

hi, all. I visualized the camera & obj pose in the same coord system. It seems a little error in Y axis?
image

image

I found the conversion of camrect2cam_i is by matmul the inverse intrinsic to the translation of camrect2img (i.e. P), and the rotation is unit matrix. But in my dataset, the P is obtained by np.matmul(np.matmul(intrinsic, ego2cam_[:3, :]), np.linalg.inv(mycoord2kitticam)), then I used the same operation as following to get the camrect2cam_i. Is it correct for my data format?

        # Get camera Poses   camare id: 02, 03
        for cam_i in range(2):
            transformation = np.eye(4)
            projection = tracking_calibration["P" + str(cam_i + 2)]  # rectified camera coordinate system -> image
            K_inv = np.linalg.inv(projection[:3, :3])
            R_t = projection[:3, 3]

            t_crect2c = np.matmul(K_inv, R_t)
            # t_crect2c = 1./projection[[0, 1, 2],[0, 1, 2]] * projection[:, 3]
            transformation[:3, 3] = R_t
            tracking_calibration["Tr_camrect2cam0" + str(cam_i + 2)] = transformation

Thank you for your reply.

Bbox pos in your data stands for the center of the 3d bbox, or center of bottom face of 3D bbox? @szhang963

@Nplace-su It is in the center of the 3d bbox, but I have converted it to kitti by

pos = np.matmul(mycoord2kitticam[:3,:3], pos)
pos[1] += h/2

image

hi, @Nplace-su . I fix a bug in Tr_camrect2cam_i, the objets can be learned, but the backgroud and the objects still can not be separate? what causes it?

hi, all. I visualized the camera & obj pose in the same coord system. It seems a little error in Y axis? image

image

I found the conversion of camrect2cam_i is by matmul the inverse intrinsic to the translation of camrect2img (i.e. P), and the rotation is unit matrix. But in my dataset, the P is obtained by np.matmul(np.matmul(intrinsic, ego2cam_[:3, :]), np.linalg.inv(mycoord2kitticam)), then I used the same operation as following to get the camrect2cam_i. Is it correct for my data format?

        # Get camera Poses   camare id: 02, 03
        for cam_i in range(2):
            transformation = np.eye(4)
            projection = tracking_calibration["P" + str(cam_i + 2)]  # rectified camera coordinate system -> image
            K_inv = np.linalg.inv(projection[:3, :3])
            R_t = projection[:3, 3]

            t_crect2c = np.matmul(K_inv, R_t)
            # t_crect2c = 1./projection[[0, 1, 2],[0, 1, 2]] * projection[:, 3]
            transformation[:3, 3] = R_t
            tracking_calibration["Tr_camrect2cam0" + str(cam_i + 2)] = transformation

Thank you for your reply.

the camera 0 in the first image has the x-axis pointing down(?), which should be pointing to its right, maybe you can check your axes systems?

image

hi, @Nplace-su . I fix a bug in Tr_camrect2cam_i, the objets can be learned, but the backgroud and the objects still can not be separate? what causes it?

these beam-like artifacts in the background image usually indicate that your scale factor is not good, maybe scaling it down would help

@wuzirui Thank you for your reply. The initial kitti data also has the same coord system (x-axis down).
image

i mean pointing right (relative) to the image is preferred. if your camera & world system aligns with this, it should be correct~

Thank you, Iwill check it. But it seems the Y-axis and z-axis reversed?
image

Thank you, Iwill check it. But it seems the Y-axis and z-axis reversed? image

from this figure, your camera coordinate does not align with your world coordinate(object coordinate), maybe you need to flip them

OK I will check it. Thank you a lot.

Hi, @wuzirui , I fixed the scale_factor from 1.0 to 0.1, and the background and objects are separated. What is the reason?Could you introduce the principle? Thanks a lot.

image
The result exists still some ghost shadow as the figure. Could you provide some solutions to solve it?