OPEN-AIR-SUN/mars

Some questions about edit foreground pedestrians

Closed this issue · 2 comments

Hello, I want to use mars to edit foreground pedestrians. I have made a custom dataset in the format of the vkitti dataset. During training, I found that the foreground pedestrians were not displayed normally (in wandb log), as shown in the figure. When I used the pedestrian data in kitti-mot for training, there was no such problem. Can you provide some suggestions?Thanks!
img_1000_83ac3580dd340f92d967
objects_depth_1000_ae7a024befc2a85b68e2
objects_rgb_1000_a38daef23f9f41cd63f5

VKITTI_Recon_Mars_Car_Depth = MethodSpecification(
config=TrainerConfig(
method_name="mars-vkitti-car-depth-recon",
steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
steps_per_save=STEPS_PER_SAVE,
save_only_latest_checkpoint=True,
max_num_iterations=MAX_NUM_ITERATIONS,
mixed_precision=False,
use_grad_scaler=True,
log_gradients=True,
pipeline=MarsPipelineConfig(
datamanager=MarsDataManagerConfig(
dataparser=MarsVKittiDataParserConfig(
use_car_latents=False,
use_depth=True,
split_setting="reconstruction",
first_frame=0,
last_frame=80,
scale_factor=1.0,
),
train_num_rays_per_batch=4096,
eval_num_rays_per_batch=4096,
camera_optimizer=CameraOptimizerConfig(mode="off"),
),
model=SceneGraphModelConfig(
background_model=NerfactoModelConfig(),
object_model_template=NerfactoModelConfig(),
object_representation="class-wise",
object_ray_sample_strategy="remove-bg",
mono_depth_loss_mult=0.01,
depth_loss_mult=0,
),
),
optimizers={
"background_model": {
"optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
"scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
},
"learnable_global": {
"optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
"scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
},
"object_model": {
"optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
"scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
},
},
# viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
vis="wandb",
),
description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)

Hi! @xiao1874

If I understand correctly, the first image is the ground truth, the second image is the rendered image, the third is the object depth, and the fourth is the object bounding box.

It seems like the performance of rendering foreground pedestrians is not good. Based on your description, you are using NeRFacto for foreground modeling, which may not be well-suited for handling dynamic foreground objects, such as pedestrians.

I suggest trying to replace NeRFacto with another model, which are able to handle moving pedestrians. If there is more than one moving pedestrian in the scene, you can try referring to the latent conditioning method in nerf.