visinf/da-sac

Problem of reproducing

Closed this issue · 5 comments

Hi, thanks for your awesome work and sharing code.

For ResNet101-DeepLab on GTAV-to-Cityscapes UDA task, I have rerun your code with the provided pre-trained model and weights for importance sampling, however, the best mIoU on val_cityscpaes is only 50.4%, which is worse than the paper result of 53.8%.

Is the problem caused by 2 GPUs? I reproduced the experiment on 2 NVIDIA Tesla V100 with 32 GB memory. However, I have noticed that you set batch_target as 1 for each GPU in code, so the total batch size of target data is less than the setting in your environment with 4 GPUs.

If possible, can you provide the complete experiment logs? They will be helpful for me to debug. :-)

Hi Wenqi,

thanks. I assume you evaluated the model with infer_val.sh?
I don't think using 2 V100's is the problem. In fact, I also used the same setting in some of the experiments. As long as there are 2 target samples with 3 crops each (i.e. the target batch size is 8), the setting is equivalent, hence should be reproducible.

How did the loss behave in the first few training iterations (from the .log file)? If you observe that the loss is momentarily increasing, this can affect the final accuracy. I observed it occasionally and would simply restart the training, if it happened. Perhaps a more sophisticated training schedule can fix it, e.g. by gradually phasing in the target loss, instead of simply switching it on.

Let me know if it helps.
Nikita

Hi Nikita, thanks for your reply.

I only get mIoU from the training log, not by infer_val.sh.

Under my 2 V100 environment, there are a total of 2 target samples with 4 crops each (i.e. the target batch size is 8), where the batch_target of each GPU is 1 and GROUP_SIZE is 4.

The loss curvings of training are shown as follows, where the loss_ce is increasing, the self_ce is oscillating, and the src_loss_ce and teacher_diff is almost decreasing, is this normal?

image

image

There are so many mIoUs in the log and tensorboard, which is the finally reported? The train_target in logits_up_all or the val_cityscapes in logits_up_all?

The actual mIoU can be only obtained with infer_val.sh. The mIoU's in the tensorboard are only proxy estimates, at best, because they're computed from image crops (please, see the code).

I have evaluated the model by infer_val.sh and the result is 53.09%, which are close to the paper result.

Thanks for your reply, and I'm sorry for my carelessness.

No problem, happy to help!