The warm-up strategy for bottom-up estimation

Hey, I notice that you mention that 'initial bottom-up estimates are not reliable', so you utilize warm-up strategy. I meet the same problem when I try to reproduce your work in pytorch. After several epochs the loss turns into 'nan'. Would you please let me know the certain codes in your work for this strategy? I try to find it but failed. And I will really appreciate that if you would please let me know if there is any possible solutions to avoid instable training for the bottom-up estimation.(I guess it is also the reason why you only use single-class image for this step?) Thank you so much!

Hi, thank you for your interest! The warm-up strategy is implemented in the operator:

ICD/core/model/layers_custom/icd.py

Lines 134 to 135 in f78286a

    
           class ICD_TopDownProp(mx.operator.CustomOpProp): 
        
               def __init__(self, grad_scale=1, warmup=0):

and called by:

ICD/run_icd.py

Line 44 in f78286a

    
           x_icd_td_ = mx.sym.Custom(x_icd_td, x_lbl, x_icd_bu_sp, warmup=2*args.num_sample//args.batch_size, op_type='ICD_TopDown', name='icd_td')

The core of this work is to exclude the disturbance of inter-class discrimination. Each intro-class discriminator only sees features belonging to the class it is responsible for. In other words, we should avoid asking it to discriminate features belonging to different foreground classes. This is why we update the bottom-up stage by only single-class images. Another possible way may be to exclude other classes' fg features by some masks, which may be derived from other classes' intra-class discriminators or the final estimations. But this way makes the pipeline too complicated and may cause another chicken or egg problem. Good luck!

Thank you so much for your work and help! It helps a lot!

	class ICD_TopDownProp(mx.operator.CustomOpProp):
	def __init__(self, grad_scale=1, warmup=0):