Training pebal with custom backbone model and dataset
mei0824 opened this issue · 1 comments
Hi there! Congrats on such excellent results and thank you so much for you inspiring work!
After reading through the paper and trying to implement PEBAL with the code provided by you (again, thank you so much for opensourcing the code), I am now experimenting the use of custom backbone model and dataset on the performance of PEBAL since the paper discussed the influence of segmentation models. There are a few points that are not mentioned in previous issues and I would like to confirm them before continue working on my experiments.
-
The layers to finetune.
As stated in the paper, only "the last block of a segmentation model" needs to be fine-tuned. From my understanding, the reason is that there is an extra channel in these layers and the parameter weights for that channel needs to be obtained through fine-tuning. Therefore, only the layers with the the constructed extra channel need to keep requires_grad as True for their parameters. I was wondering if my understanding regarding what layers to freeze and what layers to train is correct. -
How to pick m_in & m_out values
I suppose the m_in and m_out need to be repicked as well. I am curious as to what criteria are involved in the selection for m_in and m_out values. In the reply #19 (comment), it is mentioned that the energy is constrained by the two values -12 and -6. During my experiments, I am observing great overlapping between the inlier and outlier energy, I was wondering if you could explain a bit more on how they are constrained and provide some advice on choosing the two values. -
Other parameters
I was wondering if you can provide some advice on how to select other hyperparameters such as β_1, β_2 and λ as well? -
Loss
When trying to replicate the results from the PEBAL paper, it has caught my attention that both the energy loss and the gambler loss are not converging and keep on fluctuating. However, from the metrics such as AUROC, the performance has actually improved. I was wondering if this behavior is normal for PEBAL.
Thanks!
- layers
Hi thanks for your interest in our paper.
-
Yes, your understanding is correct and the finetuning last block strategy also improves a lot of the computational efficiency.
-
During the experiment, we didn't play much about the different values of m_in and m_out. We just make sure m_in is smaller than m_out. Please feel free to change different values and I believe you may find better hyperparameters.
-
For beta_1 and beta_2, we follow the default hyper-parameter from "Real-world Anomaly Detection in Surveillance Videos", and I believe there could be potentially better hyper-parameters out there. For Lamada, we are trying to map the ebm loss into the same scale of pal loss. So lambda is essentially a scaling factor.
-
Yes, this is normal for pebal since the backbones are mostly frozen.