facebookresearch/MaskFormer

How to compute the FLOPS reported in the paper

YellowPig-zp opened this issue · 2 comments

Hi Bowen, thank you for such a great work! I just have a small question about how to get the reported FLOPs in the paper. Is there a script or public repo that has this functionality? Thanks!

https://github.com/facebookresearch/detectron2/blob/main/tools/analyze_model.py

Hi Bowen. I calculate the flop and params with the scirpt, but the result is not the same with your paper.
The maskformer_swin_small_bs16_160k.yaml is 63M Params and 111G Flops. In your paper is 63M Params and 79G Flops. Is there any problems with my calculation?

python3 analyze_model.py --config-file ./configs/ade20k-150/swin/maskformer_swin_small_bs16_160k.yaml --tasks flop

Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(512, 512), max_size=2048, sample_style='choice')]
[11/15 13:41:29 detectron2]: Flops table computed from only one input sample:

module #parameters or shape #flops
model 63.075M 80.909G
backbone 48.839M 49.38G
backbone.patch_embed 4.896K 83.362M
backbone.patch_embed.proj 4.704K 75.497M
backbone.patch_embed.norm 0.192K 7.864M
backbone.layers 48.831M 49.282G
backbone.layers.0 0.299M 4.394G
backbone.layers.1 1.188M 4.367G
backbone.layers.2 33.16M 35.953G
backbone.layers.3.blocks 14.184M 4.567G
backbone.norm0 0.192K 7.864M
backbone.norm0.weight (96,)
backbone.norm0.bias (96,)
backbone.norm1 0.384K 3.932M
backbone.norm1.weight (192,)
backbone.norm1.bias (192,)
backbone.norm2 0.768K 1.966M
backbone.norm2.weight (384,)
backbone.norm2.bias (384,)
backbone.norm3 1.536K 0.983M
backbone.norm3.weight (768,)
backbone.norm3.bias (768,)
sem_seg_head 14.236M 27.453G
sem_seg_head.pixel_decoder 4.305M 23.56G
sem_seg_head.pixel_decoder.adapter_1 25.088K 0.424G
sem_seg_head.pixel_decoder.layer_1 0.59M 9.685G
sem_seg_head.pixel_decoder.adapter_2 49.664K 0.207G
sem_seg_head.pixel_decoder.layer_2 0.59M 2.421G
sem_seg_head.pixel_decoder.adapter_3 98.816K 0.102G
sem_seg_head.pixel_decoder.layer_3 0.59M 0.605G
sem_seg_head.pixel_decoder.layer_4 1.77M 0.453G
sem_seg_head.pixel_decoder.mask_features 0.59M 9.664G
sem_seg_head.predictor 9.932M 3.887G
sem_seg_head.predictor.transformer.decoder 9.473M 1.179G
sem_seg_head.predictor.query_embed 25.6K
sem_seg_head.predictor.input_proj 0.197M 50.332M
sem_seg_head.predictor.class_embed 38.807K 23.194M
sem_seg_head.predictor.mask_embed.layers 0.197M 0.118G
[11/15 13:41:29 detectron2]: Average GFlops for each type of operators:
[('conv', 32.83191595008), ('layer_norm', 0.22296760319999998), ('linear', 67.07614236672), ('matmul', 1.92566500224), ('group_norm', 0.0769406976), ('upsample_nearest2d', 0.00764854272), ('bmm', 0.139984896), ('einsum', 8.959275), ('upsample_bilinear2d', 0.29302461)]
[11/15 13:41:29 detectron2]: Total GFlops: 111.5±12.8