How to change the config to train Benchmarking-ViT-B with batch size 16 ?

Hi, thanks for the great project!
How is this max_iter=184375 in Benchmarking-ViT-B calculated ? (num_images * epochs) / batch_size ?
I want to train a Benchmarking-ViT-B model with batch size 16 on 8-GPUs environment, but I am confused by the config file.
Could you advise how to adjust hyper-parameters like max_iter, eval_period if I change the batch size to 16?

MIMDet/configs/benchmarking/benchmarking_mask_rcnn_base_FPN_100ep_LSJ_mae.py

Lines 150 to 163 in 9e1dea1

    
           train = dict( 
        
               output_dir="output/benchmarking_mask_rcnn_base_FPN_100ep_LSJ_mae", 
        
               init_checkpoint="", 
        
               max_iter=184375, 
        
               amp=dict(enabled=True),  # options for Automatic Mixed Precision 
        
               ddp=dict(  # options for DistributedDataParallel 
        
                   broadcast_buffers=False, find_unused_parameters=False, fp16_compression=True, 
        
               ), 
        
               checkpointer=dict(period=1844, max_to_keep=100),  # options for PeriodicCheckpointer 
        
               eval_period=1844, 
        
               log_period=20, 
        
               device="cuda" 
        
               # ... 
        
           )

Hi @Alxead, thanks for your interest in our work.
You are right: max_iter = (num_images * epochs) / batch_size (num_images = ~118000 for COCO).

I suggest you change max_iter = 184375 * 4 & eval_period = 1844 * 4 if your bsz = 16.
Also, I recommend changing the lr = 4e-5.

But notice that this config cannot guarantee to re-produce the original accuracy.

I believe the issue at hand was addressed, as such I'm closing this. Feel free to ask if you have further questions.

	train = dict(
	output_dir="output/benchmarking_mask_rcnn_base_FPN_100ep_LSJ_mae",
	init_checkpoint="",
	max_iter=184375,
	amp=dict(enabled=True), # options for Automatic Mixed Precision
	ddp=dict( # options for DistributedDataParallel
	broadcast_buffers=False, find_unused_parameters=False, fp16_compression=True,
	),
	checkpointer=dict(period=1844, max_to_keep=100), # options for PeriodicCheckpointer
	eval_period=1844,
	log_period=20,
	device="cuda"
	# ...
	)