A question about training on weather conditions
hsleiman1 opened this issue · 13 comments
Hi @hsleiman1, in order for us to help you, could you please provide the full command line you used and any relevant information to replicate your problem (python version, which os are you on and which version, pytorch version, gpu type...)?
Thank you!
Also try without multimodal first.
Hello,
The training command is as follows:
python train.py --dataroot datasets/clear2snowy --checkpoints_dir checkpoints --name clear2snowy --output_display_env clear2snowy --output_display_freq 50 --output_print_freq 50 --train_G_lr 0.0002 --train_D_lr 0.0001 --data_crop_size 512 --data_load_size 512 --data_dataset_mode unaligned_labeled_mask_online --model_type cut --train_batch_size 3 --train_iter_size 4 --model_input_nc 3 --model_output_nc 3 --f_s_net segformer --f_s_config_segformer models/configs/segformer/segformer_config_b0.py --train_mask_f_s_B --f_s_semantic_nclasses 11 --G_netG segformer_attn_conv --G_config_segformer models/configs/segformer/segformer_config_b0.json --data_online_creation_crop_size_A 512 --data_online_creation_crop_delta_A 64 --data_online_creation_mask_delta_A 64 --data_online_creation_crop_size_B 512 --data_online_creation_crop_delta_B 64 --dataaug_D_noise 0.01 --data_online_creation_mask_delta_B 64 --alg_cut_nce_idt --train_sem_use_label_B --D_netDs projected_d basic vision_aided --D_proj_interp 512 --D_proj_network_type vitsmall --train_G_ema --G_padding_type reflect --train_optim adam --dataaug_no_rotate --train_sem_idt --model_multimodal --train_mm_nz 16 --G_netE resnet_512 --f_s_class_weights 1 10 10 1 5 5 10 10 30 50 50 --output_display_aim_server 127.0.0.1 --output_display_visdom_port 8501 --gpu_id 0,1,2,3
I am using 4 nvidia L4 and torch==2.0.1.
Is this information sufficient?
Also try without multimodal first.
Could you please give more details on this? or a link?
Thanks!
@hsleiman1 you are missing the --train_semantic_mask
option, thus the semantic network is not trained. You can see it on the visdom since there's no f_s
loss.
Additionally, it is --f_s_config_segformer models/configs/segformer/segformer_config_b0.json
and not .py
.
Thank you, I will run with the following configuration and check:
python train.py --dataroot datasets/clear2snowy --checkpoints_dir checkpoints2 --name clear2snowy2 --output_display_env clear2snowy2 --output_display_freq 50 --output_print_freq 50 --train_G_lr 0.0002 --train_D_lr 0.0001 --data_crop_size 512 --data_load_size 512 --data_dataset_mode unaligned_labeled_mask_online --model_type cut --train_batch_size 3 --train_iter_size 4 --model_input_nc 3 --model_output_nc 3 --f_s_net segformer --f_s_config_segformer models/configs/segformer/segformer_config_b0.json --train_semantic_mask --train_mask_f_s_B --f_s_semantic_nclasses 11 --G_netG segformer_attn_conv --G_config_segformer models/configs/segformer/segformer_config_b0.json --data_online_creation_crop_size_A 512 --data_online_creation_crop_delta_A 64 --data_online_creation_mask_delta_A 64 --data_online_creation_crop_size_B 512 --data_online_creation_crop_delta_B 64 --dataaug_D_noise 0.01 --data_online_creation_mask_delta_B 64 --alg_cut_nce_idt --train_sem_use_label_B --D_netDs projected_d basic vision_aided --D_proj_interp 512 --D_proj_network_type vitsmall --train_G_ema --G_padding_type reflect --train_optim adam --dataaug_no_rotate --train_sem_idt --train_mm_nz 16 --G_netE resnet_512 --f_s_class_weights 1 10 10 1 5 5 10 10 30 50 50 --output_display_aim_server 127.0.0.1 --output_display_visdom_port 8501 --gpu_id 0,1,2,3
@hsleiman1 FYI I've tested 3 configurations on 3 runs and they all work for me, i.e. clear2snowy goes as expected, from a visual inspection viewpoint that is.
Tested configurations include using the sam
discriminator in addition to all others.
This is not enough information to understand what is happening. You need to look at mask conservation, every D loss, etc... The last image seems almost impossible: G moving to clear weather to content the discriminator whereas it's much easier to do so while remaining in night mode. This may point to a dataset issue, orverfit or something else. Never seen this on bdd100k.
I've put my recent run here: https://www.joligen.com/stuff/bdd100k/test_clear2snowy_0723.tar
You can compare to yours, from options, model inferences, etc... It seems fine after 12 epochs.
You can use the generator for inference on the full size images directly. The generator is either fully convolutional (resnet, mobilenet, unet) or directly integrate an upsampling step (segformer).