hustvl/MIMDet

RuntimeError: CUDA out of memory during training, works fine during inference.

sfarkya opened this issue · 7 comments

Hello Dear Authors,

I am trying to replicate your results for ViT benchmark model on COCO detection. I was able to successfully run the inference but I am getting a CUDA out of memory error during training on 48 GB GPU.

Here's the command I am using:

num_gpus=1
CONFIG_FILE=./configs/benchmarking/benchmarking_mask_rcnn_base_FPN_100ep_LSJ_mae.py
MAE_MODEL=../../../../models/MIMDet/VITB-MAE/mae_pretrain_vit_base_full.pth

python lazyconfig_train_net.py --config-file $CONFIG_FILE --num-gpus $num_gpus mae_checkpoint.path=$MAE_MODEL model.backbone.bottom_up.pretrained=$MAE_MODEL 

I am using using 1 GPU at the moment,

Here's the log,

Namespace(config_file='./configs/benchmarking/benchmarking_mask_rcnn_base_FPN_100ep_LSJ_mae.py', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, master_addr='', master_port='', node_rank=0, num_gpus=1, num_machines=1, opts=['mae_checkpoint.path=../../../../models/MIMDet/VITB-MAE/mae_pretrain_vit_base_full.pth', 'model.backbone.bottom_up.pretrained=../../../../models/MIMDet/VITB-MAE/mae_pretrain_vit_base_full.pth'], resume=False)
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
�[32m[06/13 15:36:23 detectron2]: �[0mRank of current process: 0. World size: 1
�[32m[06/13 15:36:25 detectron2]: �[0mEnvironment info:
----------------------  ----------------------------------------------------------------
sys.platform            linux
Python                  3.7.5 (default, Feb 23 2021, 13:22:40) [GCC 8.4.0]
numpy                   1.16.4
detectron2              0.6 @/usr/local/lib/python3.7/dist-packages/detectron2
Compiler                GCC 7.3
CUDA compiler           CUDA 11.1
detectron2 arch flags   3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.9.0+cu111 @/usr/local/lib/python3.7/dist-packages/torch
PyTorch debug build     False
GPU available           Yes
GPU 0,1,2,3             NVIDIA RTX A6000 (arch=8.6)
Driver version          510.47.03
CUDA_HOME               /usr/local/cuda
Pillow                  8.2.0
torchvision             0.10.0+cu111 @/usr/local/lib/python3.7/dist-packages/torchvision
torchvision arch flags  3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore                  0.1.5.post20220512
iopath                  0.1.9
cv2                     4.5.2
----------------------  ----------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

�[32m[06/13 15:36:25 detectron2]: �[0mCommand line arguments: Namespace(config_file='./configs/benchmarking/benchmarking_mask_rcnn_base_FPN_100ep_LSJ_mae.py', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, master_addr='', master_port='', node_rank=0, num_gpus=1, num_machines=1, opts=['mae_checkpoint.path=../../../../models/MIMDet/VITB-MAE/mae_pretrain_vit_base_full.pth', 'model.backbone.bottom_up.pretrained=../../../../models/MIMDet/VITB-MAE/mae_pretrain_vit_base_full.pth'], resume=False)
�[32m[06/13 15:36:25 detectron2]: �[0mContents of args.config_file=./configs/benchmarking/benchmarking_mask_rcnn_base_FPN_100ep_LSJ_mae.py:
�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15mdata�[39m�[38;5;15m.�[39m�[38;5;15mtransforms�[39m�[38;5;15m �[39m�[38;5;81mas�[39m�[38;5;15m �[39m�[38;5;15mT�[39m
�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mtorch�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15mconfig�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mLazyCall�[39m�[38;5;15m �[39m�[38;5;81mas�[39m�[38;5;15m �[39m�[38;5;15mL�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15mlayers�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mShapeSpec�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15mlayers�[39m�[38;5;15m.�[39m�[38;5;15mbatch_norm�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mNaiveSyncBatchNorm�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15mmodeling�[39m�[38;5;15m.�[39m�[38;5;15manchor_generator�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mDefaultAnchorGenerator�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15mmodeling�[39m�[38;5;15m.�[39m�[38;5;15mbackbone�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mFPN�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15mmodeling�[39m�[38;5;15m.�[39m�[38;5;15mbackbone�[39m�[38;5;15m.�[39m�[38;5;15mfpn�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mLastLevelMaxPool�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15mmodeling�[39m�[38;5;15m.�[39m�[38;5;15mbox_regression�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mBox2BoxTransform�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15mmodeling�[39m�[38;5;15m.�[39m�[38;5;15mmatcher�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mMatcher�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15mmodeling�[39m�[38;5;15m.�[39m�[38;5;15mpoolers�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mROIPooler�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15mmodeling�[39m�[38;5;15m.�[39m�[38;5;15mproposal_generator�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mRPN�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mStandardRPNHead�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15mmodeling�[39m�[38;5;15m.�[39m�[38;5;15mroi_heads�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15m(�[39m
�[38;5;15m    �[39m�[38;5;15mFastRCNNConvFCHead�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mFastRCNNOutputLayers�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mMaskRCNNConvUpsampleHead�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mStandardROIHeads�[39m�[38;5;15m,�[39m
�[38;5;15m)�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15msolver�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mWarmupParamScheduler�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mdetectron2�[39m�[38;5;15m.�[39m�[38;5;15msolver�[39m�[38;5;15m.�[39m�[38;5;15mbuild�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mget_default_optimizer_params�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mfvcore�[39m�[38;5;15m.�[39m�[38;5;15mcommon�[39m�[38;5;15m.�[39m�[38;5;15mparam_scheduler�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mCosineParamScheduler�[39m

�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15mmodels�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mBenchmarkingViTDet�[39m

�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15m.�[39m�[38;5;15m.�[39m�[38;5;15mcoco�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mdataloader�[39m
�[38;5;197mfrom�[39m�[38;5;15m �[39m�[38;5;15m.�[39m�[38;5;15m.�[39m�[38;5;15mcommon�[39m�[38;5;15m �[39m�[38;5;197mimport�[39m�[38;5;15m �[39m�[38;5;15mGeneralizedRCNNImageListForward�[39m

�[38;5;15mmodel�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mGeneralizedRCNNImageListForward�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m    �[39m�[38;5;15mlsj_postprocess�[39m�[38;5;197m=�[39m�[38;5;81mTrue�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mbackbone�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mFPN�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m        �[39m�[38;5;15mbottom_up�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mBenchmarkingViTDet�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m            �[39m�[38;5;15mwindow_size�[39m�[38;5;197m=�[39m�[38;5;141m16�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mwith_cp�[39m�[38;5;197m=�[39m�[38;5;81mFalse�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mpretrained�[39m�[38;5;197m=�[39m�[38;5;186m"�[39m�[38;5;186mpretrained/mae_pretrain_vit_base.pth�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mstop_grad_conv1�[39m�[38;5;197m=�[39m�[38;5;81mFalse�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15msincos_pos_embed�[39m�[38;5;197m=�[39m�[38;5;81mTrue�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mzero_pos_embed�[39m�[38;5;197m=�[39m�[38;5;81mFalse�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mimg_size�[39m�[38;5;197m=�[39m�[38;5;141m1024�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mpatch_size�[39m�[38;5;197m=�[39m�[38;5;141m16�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15membed_dim�[39m�[38;5;197m=�[39m�[38;5;141m768�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mdepth�[39m�[38;5;197m=�[39m�[38;5;141m12�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mnum_heads�[39m�[38;5;197m=�[39m�[38;5;141m12�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mdrop_path_rate�[39m�[38;5;197m=�[39m�[38;5;141m0.1�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15minit_values�[39m�[38;5;197m=�[39m�[38;5;81mNone�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mbeit_qkv_bias�[39m�[38;5;197m=�[39m�[38;5;81mFalse�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15min_features�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;186m"�[39m�[38;5;186ms0�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186ms1�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186ms2�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186ms3�[39m�[38;5;186m"�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mout_channels�[39m�[38;5;197m=�[39m�[38;5;141m256�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mnorm�[39m�[38;5;197m=�[39m�[38;5;186m"�[39m�[38;5;186mSyncBN�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mtop_block�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mLastLevelMaxPool�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mproposal_generator�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mRPN�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m        �[39m�[38;5;15min_features�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;186m"�[39m�[38;5;186mp2�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186mp3�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186mp4�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186mp5�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186mp6�[39m�[38;5;186m"�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mhead�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mStandardRPNHead�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m�[38;5;15min_channels�[39m�[38;5;197m=�[39m�[38;5;141m256�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mnum_anchors�[39m�[38;5;197m=�[39m�[38;5;141m3�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15manchor_generator�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mDefaultAnchorGenerator�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m            �[39m�[38;5;15msizes�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;15m[�[39m�[38;5;141m32�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15m[�[39m�[38;5;141m64�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15m[�[39m�[38;5;141m128�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15m[�[39m�[38;5;141m256�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15m[�[39m�[38;5;141m512�[39m�[38;5;15m]�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15maspect_ratios�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;141m0.5�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1.0�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m2.0�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mstrides�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;141m4�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m8�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m16�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m32�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m64�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15moffset�[39m�[38;5;197m=�[39m�[38;5;141m0.0�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15manchor_matcher�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mMatcher�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m            �[39m�[38;5;15mthresholds�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;141m0.3�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m0.7�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mlabels�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;141m0�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;197m-�[39m�[38;5;141m1�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mallow_low_quality_matches�[39m�[38;5;197m=�[39m�[38;5;81mTrue�[39m
�[38;5;15m        �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mbox2box_transform�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mBox2BoxTransform�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m�[38;5;15mweights�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;141m1.0�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1.0�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1.0�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1.0�[39m�[38;5;15m]�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mbatch_size_per_image�[39m�[38;5;197m=�[39m�[38;5;141m256�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mpositive_fraction�[39m�[38;5;197m=�[39m�[38;5;141m0.5�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mpre_nms_topk�[39m�[38;5;197m=�[39m�[38;5;15m(�[39m�[38;5;141m2000�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1000�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mpost_nms_topk�[39m�[38;5;197m=�[39m�[38;5;15m(�[39m�[38;5;141m1000�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1000�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mnms_thresh�[39m�[38;5;197m=�[39m�[38;5;141m0.7�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mroi_heads�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mStandardROIHeads�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m        �[39m�[38;5;15mnum_classes�[39m�[38;5;197m=�[39m�[38;5;141m80�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mbatch_size_per_image�[39m�[38;5;197m=�[39m�[38;5;141m512�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mpositive_fraction�[39m�[38;5;197m=�[39m�[38;5;141m0.25�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mproposal_matcher�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mMatcher�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m            �[39m�[38;5;15mthresholds�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;141m0.5�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mlabels�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;141m0�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mallow_low_quality_matches�[39m�[38;5;197m=�[39m�[38;5;81mFalse�[39m
�[38;5;15m        �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mbox_in_features�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;186m"�[39m�[38;5;186mp2�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186mp3�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186mp4�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186mp5�[39m�[38;5;186m"�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mbox_pooler�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mROIPooler�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m            �[39m�[38;5;15moutput_size�[39m�[38;5;197m=�[39m�[38;5;141m7�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mscales�[39m�[38;5;197m=�[39m�[38;5;15m(�[39m�[38;5;141m1.0�[39m�[38;5;15m �[39m�[38;5;197m/�[39m�[38;5;15m �[39m�[38;5;141m4�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1.0�[39m�[38;5;15m �[39m�[38;5;197m/�[39m�[38;5;15m �[39m�[38;5;141m8�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1.0�[39m�[38;5;15m �[39m�[38;5;197m/�[39m�[38;5;15m �[39m�[38;5;141m16�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1.0�[39m�[38;5;15m �[39m�[38;5;197m/�[39m�[38;5;15m �[39m�[38;5;141m32�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15msampling_ratio�[39m�[38;5;197m=�[39m�[38;5;141m0�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mpooler_type�[39m�[38;5;197m=�[39m�[38;5;186m"�[39m�[38;5;186mROIAlignV2�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mbox_head�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mFastRCNNConvFCHead�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m            �[39m�[38;5;15minput_shape�[39m�[38;5;197m=�[39m�[38;5;15mShapeSpec�[39m�[38;5;15m(�[39m�[38;5;15mchannels�[39m�[38;5;197m=�[39m�[38;5;141m256�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mheight�[39m�[38;5;197m=�[39m�[38;5;141m7�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mwidth�[39m�[38;5;197m=�[39m�[38;5;141m7�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mconv_dims�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mfc_dims�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;141m1024�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1024�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mbox_predictor�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mFastRCNNOutputLayers�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m            �[39m�[38;5;15minput_shape�[39m�[38;5;197m=�[39m�[38;5;15mShapeSpec�[39m�[38;5;15m(�[39m�[38;5;15mchannels�[39m�[38;5;197m=�[39m�[38;5;141m1024�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mtest_score_thresh�[39m�[38;5;197m=�[39m�[38;5;141m0.05�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mbox2box_transform�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mBox2BoxTransform�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m�[38;5;15mweights�[39m�[38;5;197m=�[39m�[38;5;15m(�[39m�[38;5;141m10�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m10�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m5�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m5�[39m�[38;5;15m)�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mnum_classes�[39m�[38;5;197m=�[39m�[38;5;186m"�[39m�[38;5;186m$�[39m�[38;5;186m{�[39m�[38;5;186m..num_classes}�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mmask_in_features�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;186m"�[39m�[38;5;186mp2�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186mp3�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186mp4�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186mp5�[39m�[38;5;186m"�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mmask_pooler�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mROIPooler�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m            �[39m�[38;5;15moutput_size�[39m�[38;5;197m=�[39m�[38;5;141m14�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mscales�[39m�[38;5;197m=�[39m�[38;5;15m(�[39m�[38;5;141m1.0�[39m�[38;5;15m �[39m�[38;5;197m/�[39m�[38;5;15m �[39m�[38;5;141m4�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1.0�[39m�[38;5;15m �[39m�[38;5;197m/�[39m�[38;5;15m �[39m�[38;5;141m8�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1.0�[39m�[38;5;15m �[39m�[38;5;197m/�[39m�[38;5;15m �[39m�[38;5;141m16�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m1.0�[39m�[38;5;15m �[39m�[38;5;197m/�[39m�[38;5;15m �[39m�[38;5;141m32�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15msampling_ratio�[39m�[38;5;197m=�[39m�[38;5;141m0�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mpooler_type�[39m�[38;5;197m=�[39m�[38;5;186m"�[39m�[38;5;186mROIAlignV2�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15mmask_head�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mMaskRCNNConvUpsampleHead�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m            �[39m�[38;5;15minput_shape�[39m�[38;5;197m=�[39m�[38;5;15mShapeSpec�[39m�[38;5;15m(�[39m�[38;5;15mchannels�[39m�[38;5;197m=�[39m�[38;5;141m256�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mwidth�[39m�[38;5;197m=�[39m�[38;5;141m14�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mheight�[39m�[38;5;197m=�[39m�[38;5;141m14�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mnum_classes�[39m�[38;5;197m=�[39m�[38;5;186m"�[39m�[38;5;186m$�[39m�[38;5;186m{�[39m�[38;5;186m..num_classes}�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;15mconv_dims�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;141m256�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m256�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m256�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m256�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m256�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mpixel_mean�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;141m123.675�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m116.280�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m103.530�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mpixel_std�[39m�[38;5;197m=�[39m�[38;5;15m[�[39m�[38;5;141m58.395�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m57.12�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m57.375�[39m�[38;5;15m]�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15minput_format�[39m�[38;5;197m=�[39m�[38;5;186m"�[39m�[38;5;186mRGB�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m
�[38;5;15m)�[39m
�[38;5;242m# Using NaiveSyncBatchNorm because heads may have empty input. That is not supported by�[39m
�[38;5;242m# torch.nn.SyncBatchNorm. We can remove this after�[39m
�[38;5;242m# https://github.com/pytorch/pytorch/issues/36530 is fixed.�[39m
�[38;5;15mmodel�[39m�[38;5;197m.�[39m�[38;5;15mroi_heads�[39m�[38;5;197m.�[39m�[38;5;15mbox_head�[39m�[38;5;197m.�[39m�[38;5;15mconv_norm�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;15m(�[39m
�[38;5;15m    �[39m�[38;5;15mmodel�[39m�[38;5;197m.�[39m�[38;5;15mroi_heads�[39m�[38;5;197m.�[39m�[38;5;15mmask_head�[39m�[38;5;197m.�[39m�[38;5;15mconv_norm�[39m
�[38;5;15m)�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;81mlambda�[39m�[38;5;15m �[39m�[38;5;15mc�[39m�[38;5;15m:�[39m�[38;5;15m �[39m�[38;5;15mNaiveSyncBatchNorm�[39m�[38;5;15m(�[39m�[38;5;15mc�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mstats_mode�[39m�[38;5;197m=�[39m�[38;5;186m"�[39m�[38;5;186mN�[39m�[38;5;186m"�[39m�[38;5;15m)�[39m
�[38;5;242m# fmt: on�[39m

�[38;5;242m# 2conv in RPN:�[39m
�[38;5;242m# https://github.com/tensorflow/tpu/blob/b24729de804fdb751b06467d3dce0637fa652060/models/official/detection/modeling/architecture/heads.py#L95-L97  # noqa: E501, B950�[39m
�[38;5;15mmodel�[39m�[38;5;197m.�[39m�[38;5;15mproposal_generator�[39m�[38;5;197m.�[39m�[38;5;15mhead�[39m�[38;5;197m.�[39m�[38;5;15mconv_dims�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;15m[�[39m�[38;5;197m-�[39m�[38;5;141m1�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;197m-�[39m�[38;5;141m1�[39m�[38;5;15m]�[39m

�[38;5;242m# 4conv1fc box head�[39m
�[38;5;15mmodel�[39m�[38;5;197m.�[39m�[38;5;15mroi_heads�[39m�[38;5;197m.�[39m�[38;5;15mbox_head�[39m�[38;5;197m.�[39m�[38;5;15mconv_dims�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;15m[�[39m�[38;5;141m256�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m256�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m256�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m256�[39m�[38;5;15m]�[39m
�[38;5;15mmodel�[39m�[38;5;197m.�[39m�[38;5;15mroi_heads�[39m�[38;5;197m.�[39m�[38;5;15mbox_head�[39m�[38;5;197m.�[39m�[38;5;15mfc_dims�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;15m[�[39m�[38;5;141m1024�[39m�[38;5;15m]�[39m

�[38;5;15moptimizer�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mtorch�[39m�[38;5;197m.�[39m�[38;5;15moptim�[39m�[38;5;197m.�[39m�[38;5;15mAdamW�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m    �[39m�[38;5;15mparams�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mget_default_optimizer_params�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m        �[39m�[38;5;242m# params.model is meant to be set to the model object, before instantiating�[39m
�[38;5;15m        �[39m�[38;5;242m# the optimizer.�[39m
�[38;5;15m        �[39m�[38;5;15mweight_decay_norm�[39m�[38;5;197m=�[39m�[38;5;141m0.0�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15moverrides�[39m�[38;5;197m=�[39m�[38;5;15m{�[39m
�[38;5;15m            �[39m�[38;5;186m"�[39m�[38;5;186mpos_embed�[39m�[38;5;186m"�[39m�[38;5;15m:�[39m�[38;5;15m �[39m�[38;5;15m{�[39m�[38;5;186m"�[39m�[38;5;186mweight_decay�[39m�[38;5;186m"�[39m�[38;5;15m:�[39m�[38;5;15m �[39m�[38;5;141m0.0�[39m�[38;5;15m}�[39m�[38;5;15m,�[39m
�[38;5;15m            �[39m�[38;5;186m"�[39m�[38;5;186mrelative_position_bias_table�[39m�[38;5;186m"�[39m�[38;5;15m:�[39m�[38;5;15m �[39m�[38;5;15m{�[39m�[38;5;186m"�[39m�[38;5;186mweight_decay�[39m�[38;5;186m"�[39m�[38;5;15m:�[39m�[38;5;15m �[39m�[38;5;141m0.0�[39m�[38;5;15m}�[39m�[38;5;15m,�[39m
�[38;5;15m        �[39m�[38;5;15m}�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mlr�[39m�[38;5;197m=�[39m�[38;5;141m8e-5�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mbetas�[39m�[38;5;197m=�[39m�[38;5;15m(�[39m�[38;5;141m0.9�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;141m0.999�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mweight_decay�[39m�[38;5;197m=�[39m�[38;5;141m0.1�[39m�[38;5;15m,�[39m
�[38;5;15m)�[39m

�[38;5;15mlr_multiplier�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mWarmupParamScheduler�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m    �[39m�[38;5;15mscheduler�[39m�[38;5;197m=�[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mCosineParamScheduler�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m�[38;5;15mstart_value�[39m�[38;5;197m=�[39m�[38;5;141m1.0�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mend_value�[39m�[38;5;197m=�[39m�[38;5;141m0.0�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mwarmup_length�[39m�[38;5;197m=�[39m�[38;5;141m0.25�[39m�[38;5;15m �[39m�[38;5;197m/�[39m�[38;5;15m �[39m�[38;5;141m100�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mwarmup_factor�[39m�[38;5;197m=�[39m�[38;5;141m0.001�[39m�[38;5;15m,�[39m
�[38;5;15m)�[39m

�[38;5;15mtrain�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;15mdict�[39m�[38;5;15m(�[39m
�[38;5;15m    �[39m�[38;5;15moutput_dir�[39m�[38;5;197m=�[39m�[38;5;186m"�[39m�[38;5;186moutput/benchmarking_mask_rcnn_base_FPN_100ep_LSJ_mae�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15minit_checkpoint�[39m�[38;5;197m=�[39m�[38;5;186m"�[39m�[38;5;186m"�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mmax_iter�[39m�[38;5;197m=�[39m�[38;5;141m184375�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mamp�[39m�[38;5;197m=�[39m�[38;5;15mdict�[39m�[38;5;15m(�[39m�[38;5;15menabled�[39m�[38;5;197m=�[39m�[38;5;81mTrue�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m�[38;5;15m  �[39m�[38;5;242m# options for Automatic Mixed Precision�[39m
�[38;5;15m    �[39m�[38;5;15mddp�[39m�[38;5;197m=�[39m�[38;5;15mdict�[39m�[38;5;15m(�[39m�[38;5;15m  �[39m�[38;5;242m# options for DistributedDataParallel�[39m
�[38;5;15m        �[39m�[38;5;15mbroadcast_buffers�[39m�[38;5;197m=�[39m�[38;5;81mFalse�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mfind_unused_parameters�[39m�[38;5;197m=�[39m�[38;5;81mFalse�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mfp16_compression�[39m�[38;5;197m=�[39m�[38;5;81mTrue�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mcheckpointer�[39m�[38;5;197m=�[39m�[38;5;15mdict�[39m�[38;5;15m(�[39m�[38;5;15mperiod�[39m�[38;5;197m=�[39m�[38;5;141m1844�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mmax_to_keep�[39m�[38;5;197m=�[39m�[38;5;141m100�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m�[38;5;15m  �[39m�[38;5;242m# options for PeriodicCheckpointer�[39m
�[38;5;15m    �[39m�[38;5;15meval_period�[39m�[38;5;197m=�[39m�[38;5;141m1844�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mlog_period�[39m�[38;5;197m=�[39m�[38;5;141m20�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mdevice�[39m�[38;5;197m=�[39m�[38;5;186m"�[39m�[38;5;186mcuda�[39m�[38;5;186m"�[39m
�[38;5;15m    �[39m�[38;5;242m# ...�[39m
�[38;5;15m)�[39m

�[38;5;242m# resize_and_crop_image in:�[39m
�[38;5;242m# https://github.com/tensorflow/tpu/blob/b24729de804fdb751b06467d3dce0637fa652060/models/official/detection/utils/input_utils.py#L127  # noqa: E501, B950�[39m
�[38;5;15mimage_size�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;141m1024�[39m
�[38;5;15mdataloader�[39m�[38;5;197m.�[39m�[38;5;15mtrain�[39m�[38;5;197m.�[39m�[38;5;15mtotal_batch_size�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;141m64�[39m
�[38;5;15mdataloader�[39m�[38;5;197m.�[39m�[38;5;15mtrain�[39m�[38;5;197m.�[39m�[38;5;15mmapper�[39m�[38;5;197m.�[39m�[38;5;15maugmentations�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;15m[�[39m
�[38;5;15m    �[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mT�[39m�[38;5;197m.�[39m�[38;5;15mResizeScale�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m
�[38;5;15m        �[39m�[38;5;15mmin_scale�[39m�[38;5;197m=�[39m�[38;5;141m0.1�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mmax_scale�[39m�[38;5;197m=�[39m�[38;5;141m2.0�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mtarget_height�[39m�[38;5;197m=�[39m�[38;5;15mimage_size�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mtarget_width�[39m�[38;5;197m=�[39m�[38;5;15mimage_size�[39m
�[38;5;15m    �[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mT�[39m�[38;5;197m.�[39m�[38;5;15mFixedSizeCrop�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m�[38;5;15mcrop_size�[39m�[38;5;197m=�[39m�[38;5;15m(�[39m�[38;5;15mimage_size�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mimage_size�[39m�[38;5;15m)�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mT�[39m�[38;5;197m.�[39m�[38;5;15mRandomFlip�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m�[38;5;15mhorizontal�[39m�[38;5;197m=�[39m�[38;5;81mTrue�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m]�[39m
�[38;5;15mdataloader�[39m�[38;5;197m.�[39m�[38;5;15mtrain�[39m�[38;5;197m.�[39m�[38;5;15mmapper�[39m�[38;5;197m.�[39m�[38;5;15muse_instance_mask�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;81mTrue�[39m
�[38;5;15mdataloader�[39m�[38;5;197m.�[39m�[38;5;15mtrain�[39m�[38;5;197m.�[39m�[38;5;15mmapper�[39m�[38;5;197m.�[39m�[38;5;15mimage_format�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186mRGB�[39m�[38;5;186m"�[39m
�[38;5;242m# recompute boxes due to cropping�[39m
�[38;5;15mdataloader�[39m�[38;5;197m.�[39m�[38;5;15mtrain�[39m�[38;5;197m.�[39m�[38;5;15mmapper�[39m�[38;5;197m.�[39m�[38;5;15mrecompute_boxes�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;81mTrue�[39m
�[38;5;15mdataloader�[39m�[38;5;197m.�[39m�[38;5;15mtest�[39m�[38;5;197m.�[39m�[38;5;15mmapper�[39m�[38;5;197m.�[39m�[38;5;15maugmentations�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;15m[�[39m
�[38;5;15m    �[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mT�[39m�[38;5;197m.�[39m�[38;5;15mResizeShortestEdge�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m�[38;5;15mshort_edge_length�[39m�[38;5;197m=�[39m�[38;5;15mimage_size�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mmax_size�[39m�[38;5;197m=�[39m�[38;5;15mimage_size�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m    �[39m�[38;5;15mL�[39m�[38;5;15m(�[39m�[38;5;15mT�[39m�[38;5;197m.�[39m�[38;5;15mFixedSizeCrop�[39m�[38;5;15m)�[39m�[38;5;15m(�[39m�[38;5;15mcrop_size�[39m�[38;5;197m=�[39m�[38;5;15m(�[39m�[38;5;15mimage_size�[39m�[38;5;15m,�[39m�[38;5;15m �[39m�[38;5;15mimage_size�[39m�[38;5;15m)�[39m�[38;5;15m)�[39m�[38;5;15m,�[39m
�[38;5;15m]�[39m
�[38;5;15mdataloader�[39m�[38;5;197m.�[39m�[38;5;15mtest�[39m�[38;5;197m.�[39m�[38;5;15mmapper�[39m�[38;5;197m.�[39m�[38;5;15mimage_format�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186mRGB�[39m�[38;5;186m"�[39m
�[38;5;15mdataloader�[39m�[38;5;197m.�[39m�[38;5;15mevaluator�[39m�[38;5;197m.�[39m�[38;5;15moutput_dir�[39m�[38;5;15m �[39m�[38;5;197m=�[39m�[38;5;15m �[39m�[38;5;186m"�[39m�[38;5;186m$�[39m�[38;5;186m{�[39m�[38;5;186m...train.output_dir}�[39m�[38;5;186m"�[39m

�[5m�[31mWARNING�[0m �[32m[06/13 15:36:26 d2.config.lazy]: �[0mThe config contains objects that cannot serialize to a valid yaml. output/benchmarking_mask_rcnn_base_FPN_100ep_LSJ_mae/config.yaml is human-readable but cannot be loaded.
�[5m�[31mWARNING�[0m �[32m[06/13 15:36:26 d2.config.lazy]: �[0mConfig is saved using cloudpickle at output/benchmarking_mask_rcnn_base_FPN_100ep_LSJ_mae/config.yaml.pkl.
�[32m[06/13 15:36:26 detectron2]: �[0mFull config saved to output/benchmarking_mask_rcnn_base_FPN_100ep_LSJ_mae/config.yaml
�[32m[06/13 15:36:26 d2.utils.env]: �[0mUsing a generated random seed 26272889
�[32m[06/13 15:36:35 models.benchmarking]: �[0mResized position embedding: torch.Size([1, 197, 768]) to torch.Size([1, 4097, 768])
�[32m[06/13 15:36:35 models.benchmarking]: �[0mPosition embedding grid-size from [14, 14] to (64, 64)
�[32m[06/13 15:36:35 models.benchmarking]: �[0mLoading ViT pretrained weights from ../../../../models/MIMDet/VITB-MAE/mae_pretrain_vit_base_full.pth.
�[5m�[31mWARNING�[0m �[32m[06/13 15:36:35 models.benchmarking]: �[0mmissing keys: []
�[5m�[31mWARNING�[0m �[32m[06/13 15:36:35 models.benchmarking]: �[0munexpected keys: ['cls_token', 'mask_token', 'decoder_pos_embed', 'norm.weight', 'norm.bias', 'decoder_embed.weight', 'decoder_embed.bias', 'decoder_blocks.0.norm1.weight', 'decoder_blocks.0.norm1.bias', 'decoder_blocks.0.attn.qkv.weight', 'decoder_blocks.0.attn.qkv.bias', 'decoder_blocks.0.attn.proj.weight', 'decoder_blocks.0.attn.proj.bias', 'decoder_blocks.0.norm2.weight', 'decoder_blocks.0.norm2.bias', 'decoder_blocks.0.mlp.fc1.weight', 'decoder_blocks.0.mlp.fc1.bias', 'decoder_blocks.0.mlp.fc2.weight', 'decoder_blocks.0.mlp.fc2.bias', 'decoder_blocks.1.norm1.weight', 'decoder_blocks.1.norm1.bias', 'decoder_blocks.1.attn.qkv.weight', 'decoder_blocks.1.attn.qkv.bias', 'decoder_blocks.1.attn.proj.weight', 'decoder_blocks.1.attn.proj.bias', 'decoder_blocks.1.norm2.weight', 'decoder_blocks.1.norm2.bias', 'decoder_blocks.1.mlp.fc1.weight', 'decoder_blocks.1.mlp.fc1.bias', 'decoder_blocks.1.mlp.fc2.weight', 'decoder_blocks.1.mlp.fc2.bias', 'decoder_blocks.2.norm1.weight', 'decoder_blocks.2.norm1.bias', 'decoder_blocks.2.attn.qkv.weight', 'decoder_blocks.2.attn.qkv.bias', 'decoder_blocks.2.attn.proj.weight', 'decoder_blocks.2.attn.proj.bias', 'decoder_blocks.2.norm2.weight', 'decoder_blocks.2.norm2.bias', 'decoder_blocks.2.mlp.fc1.weight', 'decoder_blocks.2.mlp.fc1.bias', 'decoder_blocks.2.mlp.fc2.weight', 'decoder_blocks.2.mlp.fc2.bias', 'decoder_blocks.3.norm1.weight', 'decoder_blocks.3.norm1.bias', 'decoder_blocks.3.attn.qkv.weight', 'decoder_blocks.3.attn.qkv.bias', 'decoder_blocks.3.attn.proj.weight', 'decoder_blocks.3.attn.proj.bias', 'decoder_blocks.3.norm2.weight', 'decoder_blocks.3.norm2.bias', 'decoder_blocks.3.mlp.fc1.weight', 'decoder_blocks.3.mlp.fc1.bias', 'decoder_blocks.3.mlp.fc2.weight', 'decoder_blocks.3.mlp.fc2.bias', 'decoder_blocks.4.norm1.weight', 'decoder_blocks.4.norm1.bias', 'decoder_blocks.4.attn.qkv.weight', 'decoder_blocks.4.attn.qkv.bias', 'decoder_blocks.4.attn.proj.weight', 'decoder_blocks.4.attn.proj.bias', 'decoder_blocks.4.norm2.weight', 'decoder_blocks.4.norm2.bias', 'decoder_blocks.4.mlp.fc1.weight', 'decoder_blocks.4.mlp.fc1.bias', 'decoder_blocks.4.mlp.fc2.weight', 'decoder_blocks.4.mlp.fc2.bias', 'decoder_blocks.5.norm1.weight', 'decoder_blocks.5.norm1.bias', 'decoder_blocks.5.attn.qkv.weight', 'decoder_blocks.5.attn.qkv.bias', 'decoder_blocks.5.attn.proj.weight', 'decoder_blocks.5.attn.proj.bias', 'decoder_blocks.5.norm2.weight', 'decoder_blocks.5.norm2.bias', 'decoder_blocks.5.mlp.fc1.weight', 'decoder_blocks.5.mlp.fc1.bias', 'decoder_blocks.5.mlp.fc2.weight', 'decoder_blocks.5.mlp.fc2.bias', 'decoder_blocks.6.norm1.weight', 'decoder_blocks.6.norm1.bias', 'decoder_blocks.6.attn.qkv.weight', 'decoder_blocks.6.attn.qkv.bias', 'decoder_blocks.6.attn.proj.weight', 'decoder_blocks.6.attn.proj.bias', 'decoder_blocks.6.norm2.weight', 'decoder_blocks.6.norm2.bias', 'decoder_blocks.6.mlp.fc1.weight', 'decoder_blocks.6.mlp.fc1.bias', 'decoder_blocks.6.mlp.fc2.weight', 'decoder_blocks.6.mlp.fc2.bias', 'decoder_blocks.7.norm1.weight', 'decoder_blocks.7.norm1.bias', 'decoder_blocks.7.attn.qkv.weight', 'decoder_blocks.7.attn.qkv.bias', 'decoder_blocks.7.attn.proj.weight', 'decoder_blocks.7.attn.proj.bias', 'decoder_blocks.7.norm2.weight', 'decoder_blocks.7.norm2.bias', 'decoder_blocks.7.mlp.fc1.weight', 'decoder_blocks.7.mlp.fc1.bias', 'decoder_blocks.7.mlp.fc2.weight', 'decoder_blocks.7.mlp.fc2.bias', 'decoder_norm.weight', 'decoder_norm.bias', 'decoder_pred.weight', 'decoder_pred.bias']
�[32m[06/13 15:36:35 detectron2]: �[0mModel:
GeneralizedRCNNImageListForward(
  (backbone): FPN(
    (fpn_lateral2): Conv2d(
      768, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
      (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (fpn_output2): Conv2d(
      256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
      (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (fpn_lateral3): Conv2d(
      768, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
      (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (fpn_output3): Conv2d(
      256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
      (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (fpn_lateral4): Conv2d(
      768, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
      (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (fpn_output4): Conv2d(
      256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
      (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (fpn_lateral5): Conv2d(
      768, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
      (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (fpn_output5): Conv2d(
      256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
      (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (top_block): LastLevelMaxPool()
    (bottom_up): BenchmarkingViTDet(
      (patch_embed): PatchEmbed(
        (proj): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
        (norm): Identity()
      )
      (pos_drop): Dropout(p=0.0, inplace=False)
      (blocks): Sequential(
        (0): Block(
          (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=768, out_features=2304, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=768, out_features=768, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): Identity()
          (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
            (drop2): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=768, out_features=2304, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=768, out_features=768, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath(p=0.00909090880304575)
          (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
            (drop2): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=768, out_features=2304, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=768, out_features=768, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath(p=0.0181818176060915)
          (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
            (drop2): Dropout(p=0.0, inplace=False)
          )
        )
        (3): Block(
          (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=768, out_features=2304, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=768, out_features=768, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath(p=0.027272727340459824)
          (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
            (drop2): Dropout(p=0.0, inplace=False)
          )
        )
        (4): Block(
          (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=768, out_features=2304, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=768, out_features=768, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath(p=0.036363635212183)
          (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
            (drop2): Dropout(p=0.0, inplace=False)
          )
        )
        (5): Block(
          (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=768, out_features=2304, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=768, out_features=768, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath(p=0.045454543083906174)
          (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
            (drop2): Dropout(p=0.0, inplace=False)
          )
        )
        (6): Block(
          (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=768, out_features=2304, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=768, out_features=768, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath(p=0.054545458406209946)
          (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
            (drop2): Dropout(p=0.0, inplace=False)
          )
        )
        (7): Block(
          (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=768, out_features=2304, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=768, out_features=768, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath(p=0.06363636255264282)
          (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
            (drop2): Dropout(p=0.0, inplace=False)
          )
        )
        (8): Block(
          (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=768, out_features=2304, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=768, out_features=768, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath(p=0.0727272778749466)
          (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
            (drop2): Dropout(p=0.0, inplace=False)
          )
        )
        (9): Block(
          (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=768, out_features=2304, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=768, out_features=768, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath(p=0.08181818574666977)
          (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
            (drop2): Dropout(p=0.0, inplace=False)
          )
        )
        (10): Block(
          (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=768, out_features=2304, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=768, out_features=768, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath(p=0.09090909361839294)
          (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
            (drop2): Dropout(p=0.0, inplace=False)
          )
        )
        (11): Block(
          (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=768, out_features=2304, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=768, out_features=768, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath(p=0.10000000149011612)
          (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
            (drop2): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm): None
      (pre_logits): Identity()
      (head): None
      (ms_adaptor): ModuleList(
        (0): Sequential(
          (0): ConvTranspose2d(768, 768, kernel_size=(2, 2), stride=(2, 2))
          (1): GroupNorm(32, 768, eps=1e-05, affine=True)
          (2): GELU()
          (3): ConvTranspose2d(768, 768, kernel_size=(2, 2), stride=(2, 2))
        )
        (1): ConvTranspose2d(768, 768, kernel_size=(2, 2), stride=(2, 2))
        (2): Identity()
        (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (windowed_rel_pos_bias): RelativePositionBias()
      (global_rel_pos_bias): RelativePositionBias()
    )
  )
  (proposal_generator): RPN(
    (rpn_head): StandardRPNHead(
      (conv): Sequential(
        (conv0): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
          (activation): ReLU()
        )
        (conv1): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
          (activation): ReLU()
        )
      )
      (objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
      (anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
    )
    (anchor_generator): DefaultAnchorGenerator(
      (cell_anchors): BufferList()
    )
  )
  (roi_heads): StandardROIHeads(
    (box_pooler): ROIPooler(
      (level_poolers): ModuleList(
        (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True)
        (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True)
        (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
        (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
      )
    )
    (box_head): FastRCNNConvFCHead(
      (conv1): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): NaiveSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
      )
      (conv2): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): NaiveSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
      )
      (conv3): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): NaiveSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
      )
      (conv4): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): NaiveSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
      )
      (flatten): Flatten(start_dim=1, end_dim=-1)
      (fc1): Linear(in_features=12544, out_features=1024, bias=True)
      (fc_relu1): ReLU()
    )
    (box_predictor): FastRCNNOutputLayers(
      (cls_score): Linear(in_features=1024, out_features=81, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=320, bias=True)
    )
    (mask_pooler): ROIPooler(
      (level_poolers): ModuleList(
        (0): ROIAlign(output_size=(14, 14), spatial_scale=0.25, sampling_ratio=0, aligned=True)
        (1): ROIAlign(output_size=(14, 14), spatial_scale=0.125, sampling_ratio=0, aligned=True)
        (2): ROIAlign(output_size=(14, 14), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
        (3): ROIAlign(output_size=(14, 14), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
      )
    )
    (mask_head): MaskRCNNConvUpsampleHead(
      (mask_fcn1): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): NaiveSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
      )
      (mask_fcn2): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): NaiveSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
      )
      (mask_fcn3): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): NaiveSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
      )
      (mask_fcn4): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): NaiveSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
      )
      (deconv): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2))
      (deconv_relu): ReLU()
      (predictor): Conv2d(256, 80, kernel_size=(1, 1), stride=(1, 1))
    )
  )
)
�[32m[06/13 15:37:00 d2.data.datasets.coco]: �[0mLoading /root/dataset/k8s/dataset/coco/annotations/instances_train2017.json takes 20.75 seconds.
�[32m[06/13 15:37:00 d2.data.datasets.coco]: �[0mLoaded 118287 images in COCO format from /root/dataset/k8s/dataset/coco/annotations/instances_train2017.json
�[32m[06/13 15:37:07 d2.data.build]: �[0mRemoved 1021 images with no usable annotations. 117266 images left.
�[32m[06/13 15:37:11 d2.data.build]: �[0mDistribution of instances among all 80 categories:
�[36m|   category    | #instances   |   category   | #instances   |   category    | #instances   |
|:-------------:|:-------------|:------------:|:-------------|:-------------:|:-------------|
|    person     | 257253       |   bicycle    | 7056         |      car      | 43533        |
|  motorcycle   | 8654         |   airplane   | 5129         |      bus      | 6061         |
|     train     | 4570         |    truck     | 9970         |     boat      | 10576        |
| traffic light | 12842        | fire hydrant | 1865         |   stop sign   | 1983         |
| parking meter | 1283         |    bench     | 9820         |     bird      | 10542        |
|      cat      | 4766         |     dog      | 5500         |     horse     | 6567         |
|     sheep     | 9223         |     cow      | 8014         |   elephant    | 5484         |
|     bear      | 1294         |    zebra     | 5269         |    giraffe    | 5128         |
|   backpack    | 8714         |   umbrella   | 11265        |    handbag    | 12342        |
|      tie      | 6448         |   suitcase   | 6112         |    frisbee    | 2681         |
|     skis      | 6623         |  snowboard   | 2681         |  sports ball  | 6299         |
|     kite      | 8802         | baseball bat | 3273         | baseball gl.. | 3747         |
|  skateboard   | 5536         |  surfboard   | 6095         | tennis racket | 4807         |
|    bottle     | 24070        |  wine glass  | 7839         |      cup      | 20574        |
|     fork      | 5474         |    knife     | 7760         |     spoon     | 6159         |
|     bowl      | 14323        |    banana    | 9195         |     apple     | 5776         |
|   sandwich    | 4356         |    orange    | 6302         |   broccoli    | 7261         |
|    carrot     | 7758         |   hot dog    | 2884         |     pizza     | 5807         |
|     donut     | 7005         |     cake     | 6296         |     chair     | 38073        |
|     couch     | 5779         | potted plant | 8631         |      bed      | 4192         |
| dining table  | 15695        |    toilet    | 4149         |      tv       | 5803         |
|    laptop     | 4960         |    mouse     | 2261         |    remote     | 5700         |
|   keyboard    | 2854         |  cell phone  | 6422         |   microwave   | 1672         |
|     oven      | 3334         |   toaster    | 225          |     sink      | 5609         |
| refrigerator  | 2634         |     book     | 24077        |     clock     | 6320         |
|     vase      | 6577         |   scissors   | 1464         |  teddy bear   | 4729         |
|  hair drier   | 198          |  toothbrush  | 1945         |               |              |
|     total     | 849949       |              |              |               |              |�[0m
�[32m[06/13 15:37:11 d2.data.dataset_mapper]: �[0m[DatasetMapper] Augmentations used in training: [ResizeScale(min_scale=0.1, max_scale=2.0, target_height=1024, target_width=1024), FixedSizeCrop(crop_size=[1024, 1024]), RandomFlip()]
�[32m[06/13 15:37:11 d2.data.common]: �[0mSerializing 117266 elements to byte tensors and concatenating them all ...
�[32m[06/13 15:37:14 d2.data.common]: �[0mSerialized dataset takes 453.12 MiB
�[32m[06/13 15:37:18 fvcore.common.checkpoint]: �[0mNo checkpoint found. Initializing model from scratch
�[32m[06/13 15:37:18 d2.engine.train_loop]: �[0mStarting training from iteration 0
�[4m�[5m�[31mERROR�[0m �[32m[06/13 15:37:25 d2.engine.train_loop]: �[0mException during training:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 395, in run_step
    loss_dict = self.model(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "./configs/common.py", line 30, in forward
    features = self.backbone(images)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward
    bottom_up_features = self.bottom_up(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/dataset/k8s/saurabh/code/workdir/fresh-mae/MIMDet/models/benchmarking.py", line 687, in forward
    x, self.global_rel_pos_bias()
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/dataset/k8s/saurabh/code/workdir/fresh-mae/MIMDet/models/benchmarking.py", line 265, in forward
    x = x + self.drop_path(self.attn(self.norm1(x), rel_pos_bias=rel_pos_bias))
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/dataset/k8s/saurabh/code/workdir/fresh-mae/MIMDet/models/benchmarking.py", line 175, in forward
    attn = (q @ k.transpose(-2, -1)) * self.scale
RuntimeError: CUDA out of memory. Tried to allocate 24.00 GiB (GPU 0; 47.54 GiB total capacity; 37.74 GiB already allocated; 7.35 GiB free; 38.55 GiB reserved in total by PyTorch)
�[32m[06/13 15:37:25 d2.engine.hooks]: �[0mTotal training time: 0:00:06 (0:00:00 on hooks)
�[32m[06/13 15:37:25 d2.utils.events]: �[0m iter: 0    lr: N/A  max_mem: 38644M

I did not see any problems in the forward pass of this network during inference, Is there something I am missing?

Any help is really appreciated.

Looks like its working for batch size = 4 with multiple GPUs (single GPU has init_process_group error)
So, is batch size = 4 reasonable or very less?

Hi, @sfarkya! Thanks for your interest in our work.

We run benchmarking vit-base with a total batch size of 64 and 32 V100 gpus (i.e., 2 images per gpu).

If your resources are limited, I recommend to use 1/4 default bsz and 1/2 default lr for training.
Please also refer to #3 & the 8-GPU config.

Hi @vealocia thank you for your reply.
I see then the training on my side makes sense.
Sure, thank you for the reference, I will check them out.
Btw this the 8-GPU config is not available.

Hi @vealocia thank you for your reply. I see then the training on my side makes sense. Sure, thank you for the reference, I will check them out. Btw this the 8-GPU config is not available.

https://github.com/hustvl/MIMDet/blob/v1.0.0/configs/mimdet/mimdet_vit_base_mask_rcnn_fpn_sr_0p25_800_1333_4xdec_coco_3x_bs16.py

check this one

Thank you so much!
I don't have access to larger GPUs but I do have access to 24 GB GPUs.
I tried a batch size of 2 per GPU but the CUDA is still out of memory.
Do you think I can train the model with batch-size = 1 on 8, A5000 (24 GB) GPUs? It does not give a memory error but I am concerned if the distributed updates because of batch size = 1/gpu will be good or not?
Also, do you suggest any changes to the 8 GPU config in case I try to train that?

An optional strategy is using gradient checkpoint for less GPU demands but longer training time. You can refer to this for detail.
Training with 1 img per gpu sounds feasible, maybe you can have a try and leave a comment if you find something weird.

I could train it successfully on a smaller subset of data with batch size 1 on multiple GPUs.