Some questions about edit foreground pedestrians

Question

Some questions about edit foreground pedestrians

Closed this issue 3 months ago · 2 comments

Hello, I want to use mars to edit foreground pedestrians. I have made a custom dataset in the format of the vkitti dataset. During training, I found that the foreground pedestrians were not displayed normally (in wandb log), as shown in the figure. When I used the pedestrian data in kitti-mot for training, there was no such problem. Can you provide some suggestions?Thanks!

Answer 1 · 2024-06-12T11:52:27.000Z

VKITTI_Recon_Mars_Car_Depth = MethodSpecification(
config=TrainerConfig(
method_name="mars-vkitti-car-depth-recon",
steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
steps_per_save=STEPS_PER_SAVE,
save_only_latest_checkpoint=True,
max_num_iterations=MAX_NUM_ITERATIONS,
mixed_precision=False,
use_grad_scaler=True,
log_gradients=True,
pipeline=MarsPipelineConfig(
datamanager=MarsDataManagerConfig(
dataparser=MarsVKittiDataParserConfig(
use_car_latents=False,
use_depth=True,
split_setting="reconstruction",
first_frame=0,
last_frame=80,
scale_factor=1.0,
),
train_num_rays_per_batch=4096,
eval_num_rays_per_batch=4096,
camera_optimizer=CameraOptimizerConfig(mode="off"),
),
model=SceneGraphModelConfig(
background_model=NerfactoModelConfig(),
object_model_template=NerfactoModelConfig(),
object_representation="class-wise",
object_ray_sample_strategy="remove-bg",
mono_depth_loss_mult=0.01,
depth_loss_mult=0,
),
),
optimizers={
"background_model": {
"optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
"scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
},
"learnable_global": {
"optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
"scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
},
"object_model": {
"optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
"scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
},
},
# viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
vis="wandb",
),
description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)

Answer 2 · 2024-06-15T08:06:26.000Z

Hi! @xiao1874

If I understand correctly, the first image is the ground truth, the second image is the rendered image, the third is the object depth, and the fourth is the object bounding box.

It seems like the performance of rendering foreground pedestrians is not good. Based on your description, you are using NeRFacto for foreground modeling, which may not be well-suited for handling dynamic foreground objects, such as pedestrians.

I suggest trying to replace NeRFacto with another model, which are able to handle moving pedestrians. If there is more than one moving pedestrian in the scene, you can try referring to the latent conditioning method in nerf.