hujiecpp/ISTR

RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch)

aymennturki opened this issue · 3 comments

dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[08/25 11:15:41 detectron2]: Contents of args.config_file=projects/ISTR/configs/ISTR-AE-R50-3x.yaml:
BASE: "Base-ISTR.yaml"
MODEL:
WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
RESNETS:
DEPTH: 50
STRIDE_IN_1X1: False
ISTR:
NUM_PROPOSALS: 300
NUM_CLASSES: 5
MASK_ENCODING_METHOD: "AE"
PATH_COMPONENTS: "/content/drive/MyDrive/imenselmi/ISTR_TRAIN/ISTR/projects/AE/checkpoints/AE_112_256.t7"
DATASETS:
TRAIN: ("train",)
TEST: ("val",)
SOLVER:
STEPS: (210000, 250000)
MAX_ITER: 270000
INPUT:
FORMAT: "RGB"

[08/25 11:15:41 detectron2]: Running with full config:
CUDNN_BENCHMARK: true
DATALOADER:
ASPECT_RATIO_GROUPING: true
FILTER_EMPTY_ANNOTATIONS: true
NUM_WORKERS: 4
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:

  • val
    TRAIN:
  • train
    GLOBAL:
    HACK: 1.0
    INPUT:
    CROP:
    ENABLED: true
    SIZE:
    • 384
    • 600
      TYPE: absolute_range
      FORMAT: RGB
      MASK_FORMAT: polygon
      MAX_SIZE_TEST: 1333
      MAX_SIZE_TRAIN: 1333
      MIN_SIZE_TEST: 800
      MIN_SIZE_TRAIN:
  • 416
  • 448
  • 480
  • 512
  • 544
  • 576
  • 608
  • 640
  • 672
  • 704
  • 736
  • 768
  • 800
  • 832
  • 864
  • 896
  • 928
  • 960
  • 992
  • 1024
  • 1056
  • 1088
    MIN_SIZE_TRAIN_SAMPLING: choice
    RANDOM_FLIP: horizontal
    LSJ_AUG: false
    MODEL:
    ANCHOR_GENERATOR:
    ANGLES:
      • -90
      • 0
      • 90
        ASPECT_RATIOS:
      • 0.5
      • 1.0
      • 2.0
        NAME: DefaultAnchorGenerator
        OFFSET: 0.0
        SIZES:
      • 32
      • 64
      • 128
      • 256
      • 512
        BACKBONE:
        FREEZE_AT: -1
        NAME: build_resnet_fpn_backbone
        DEVICE: cuda
        FPN:
        FUSE_TYPE: sum
        IN_FEATURES:
    • res2
    • res3
    • res4
    • res5
      NORM: ''
      OUT_CHANNELS: 256
      ISTR:
      ALPHA: 0.25
      CLASS_WEIGHT: 2.0
      DEEP_SUPERVISION: true
      DIM_DYNAMIC: 64
      DIM_FEEDFORWARD: 2048
      DROPOUT: 0.0
      FEAT_WEIGHT: 1.0
      GAMMA: 2.0
      GIOU_WEIGHT: 2.0
      HIDDEN_DIM: 256
      IOU_LABELS:
    • 0
    • 1
      IOU_THRESHOLDS:
    • 0.5
      L1_WEIGHT: 5.0
      MASK_ENCODING_METHOD: AE
      MASK_FEAT_DIM: 256
      MASK_SIZE: 112
      MASK_WEIGHT: 5.0
      NHEADS: 8
      NO_OBJECT_WEIGHT: 0.1
      NUM_CLASSES: 5
      NUM_CLS: 3
      NUM_DYNAMIC: 2
      NUM_HEADS: 6
      NUM_MASK: 3
      NUM_PROPOSALS: 300
      NUM_REG: 3
      PATH_COMPONENTS: /content/drive/MyDrive/imenselmi/ISTR_TRAIN/ISTR/projects/AE/checkpoints/AE_112_256.t7
      PRIOR_PROB: 0.01
      KEYPOINT_ON: false
      LOAD_PROPOSALS: false
      MASK_ON: true
      META_ARCHITECTURE: ISTR
      PANOPTIC_FPN:
      COMBINE:
      ENABLED: true
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
      INSTANCE_LOSS_WEIGHT: 1.0
      PIXEL_MEAN:
  • 123.675
  • 116.28
  • 103.53
    PIXEL_STD:
  • 58.395
  • 57.12
  • 57.375
    PROPOSAL_GENERATOR:
    MIN_SIZE: 0
    NAME: RPN
    RESNETS:
    DEFORM_MODULATED: false
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE:
    • false
    • false
    • false
    • false
      DEPTH: 50
      NORM: FrozenBN
      NUM_GROUPS: 1
      OUT_FEATURES:
    • res2
    • res3
    • res4
    • res5
      RES2_OUT_CHANNELS: 256
      RES5_DILATION: 1
      STEM_OUT_CHANNELS: 64
      STRIDE_IN_1X1: false
      WIDTH_PER_GROUP: 64
      RETINANET:
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_WEIGHTS: &id001
    • 1.0
    • 1.0
    • 1.0
    • 1.0
      FOCAL_LOSS_ALPHA: 0.25
      FOCAL_LOSS_GAMMA: 2.0
      IN_FEATURES:
    • p3
    • p4
    • p5
    • p6
    • p7
      IOU_LABELS:
    • 0
    • -1
    • 1
      IOU_THRESHOLDS:
    • 0.4
    • 0.5
      NMS_THRESH_TEST: 0.5
      NORM: ''
      NUM_CLASSES: 80
      NUM_CONVS: 4
      PRIOR_PROB: 0.01
      SCORE_THRESH_TEST: 0.05
      SMOOTH_L1_LOSS_BETA: 0.1
      TOPK_CANDIDATES_TEST: 1000
      ROI_BOX_CASCADE_HEAD:
      BBOX_REG_WEIGHTS:
      • 10.0
      • 10.0
      • 5.0
      • 5.0
      • 20.0
      • 20.0
      • 10.0
      • 10.0
      • 30.0
      • 30.0
      • 15.0
      • 15.0
        IOUS:
    • 0.5
    • 0.6
    • 0.7
      ROI_BOX_HEAD:
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_LOSS_WEIGHT: 1.0
      BBOX_REG_WEIGHTS:
    • 10.0
    • 10.0
    • 5.0
    • 5.0
      CLS_AGNOSTIC_BBOX_REG: false
      CONV_DIM: 256
      FC_DIM: 1024
      NAME: ''
      NORM: ''
      NUM_CONV: 0
      NUM_FC: 0
      POOLER_RESOLUTION: 7
      POOLER_SAMPLING_RATIO: 2
      POOLER_TYPE: ROIAlignV2
      SMOOTH_L1_BETA: 0.0
      TRAIN_ON_PRED_BOXES: false
      ROI_HEADS:
      BATCH_SIZE_PER_IMAGE: 512
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
      IOU_LABELS:
    • 0
    • 1
      IOU_THRESHOLDS:
    • 0.5
      NAME: Res5ROIHeads
      NMS_THRESH_TEST: 0.5
      NUM_CLASSES: 80
      POSITIVE_FRACTION: 0.25
      PROPOSAL_APPEND_GT: true
      SCORE_THRESH_TEST: 0.05
      ROI_KEYPOINT_HEAD:
      CONV_DIMS:
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
      LOSS_WEIGHT: 1.0
      MIN_KEYPOINTS_PER_IMAGE: 1
      NAME: KRCNNConvDeconvUpsampleHead
      NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
      NUM_KEYPOINTS: 17
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      ROI_MASK_HEAD:
      CLS_AGNOSTIC_MASK: false
      CONV_DIM: 256
      NAME: MaskRCNNConvUpsampleHead
      NORM: ''
      NUM_CONV: 0
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      RPN:
      BATCH_SIZE_PER_IMAGE: 256
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_LOSS_WEIGHT: 1.0
      BBOX_REG_WEIGHTS: *id001
      BOUNDARY_THRESH: -1
      CONV_DIMS:
    • -1
      HEAD_NAME: StandardRPNHead
      IN_FEATURES:
    • res4
      IOU_LABELS:
    • 0
    • -1
    • 1
      IOU_THRESHOLDS:
    • 0.3
    • 0.7
      LOSS_WEIGHT: 1.0
      NMS_THRESH: 0.7
      POSITIVE_FRACTION: 0.5
      POST_NMS_TOPK_TEST: 1000
      POST_NMS_TOPK_TRAIN: 2000
      PRE_NMS_TOPK_TEST: 6000
      PRE_NMS_TOPK_TRAIN: 12000
      SMOOTH_L1_BETA: 0.0
      SEM_SEG_HEAD:
      COMMON_STRIDE: 4
      CONVS_DIM: 128
      IGNORE_VALUE: 255
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
      LOSS_WEIGHT: 1.0
      NAME: SemSegFPNHead
      NORM: GN
      NUM_CLASSES: 54
      SWINT:
      APE: false
      DEPTHS:
    • 2
    • 2
    • 6
    • 2
      DROP_PATH_RATE: 0.2
      EMBED_DIM: 96
      MLP_RATIO: 4
      NUM_HEADS:
    • 3
    • 6
    • 12
    • 24
      OUT_FEATURES:
    • stage2
    • stage3
    • stage4
    • stage5
      WINDOW_SIZE: 7
      WEIGHTS: detectron2://ImageNetPretrained/torchvision/R-50.pkl
      OUTPUT_DIR: ./output
      SEED: 2333333
      SOLVER:
      AMP:
      ENABLED: false
      BACKBONE_MULTIPLIER: 1.0
      BASE_LR: 2.5e-05
      BIAS_LR_FACTOR: 1.0
      CHECKPOINT_PERIOD: 5000
      CLIP_GRADIENTS:
      CLIP_TYPE: full_model
      CLIP_VALUE: 1.0
      ENABLED: true
      NORM_TYPE: 2.0
      GAMMA: 0.1
      IMS_PER_BATCH: 16
      LR_SCHEDULER_NAME: WarmupMultiStepLR
      MAX_ITER: 270000
      MOMENTUM: 0.9
      NESTEROV: false
      OPTIMIZER: ADAMW
      REFERENCE_WORLD_SIZE: 0
      STEPS:
  • 210000
  • 250000
    WARMUP_FACTOR: 0.01
    WARMUP_ITERS: 1000
    WARMUP_METHOD: linear
    WEIGHT_DECAY: 0.0001
    WEIGHT_DECAY_BIAS: null
    WEIGHT_DECAY_NORM: 0.0
    TEST:
    AUG:
    ENABLED: false
    FLIP: true
    MAX_SIZE: 4000
    MIN_SIZES:
    • 400
    • 500
    • 600
    • 700
    • 800
    • 900
    • 1000
    • 1100
    • 1200
      DETECTIONS_PER_IMAGE: 100
      EVAL_PERIOD: 7330
      EXPECTED_RESULTS: []
      KEYPOINT_OKS_SIGMAS: []
      PRECISE_BN:
      ENABLED: false
      NUM_ITER: 200
      VERSION: 2
      VIS_PERIOD: 0

[08/25 11:15:41 detectron2]: Full config saved to ./output/config.yaml
[08/25 11:15:47 d2.engine.defaults]: Model:
ISTR(
(backbone): FPN(
(fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(top_block): LastLevelMaxPool()
(bottom_up): ResNet(
(stem): BasicStem(
(conv1): Conv2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
)
(res2): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv1): Conv2d(
64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
)
(res3): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv1): Conv2d(
256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
)
(res4): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv1): Conv2d(
512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(4): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(5): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
)
(res5): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv1): Conv2d(
1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
)
)
)
(pos_embeddings): Embedding(300, 256)
(init_proposal_boxes): Embedding(300, 4)
(IFE): ImgFeatExtractor()
(mask_E): Encoder(
(encoder): Sequential(
(0): Conv2d(1, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=True)
(3): Conv2d(16, 32, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=True)
(6): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ELU(alpha=True)
(9): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(10): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): ELU(alpha=True)
(12): Conv2d(128, 256, kernel_size=(7, 7), stride=(1, 1))
(13): View()
)
)
(mask_D): Decoder(
(decoder): Sequential(
(0): View()
(1): ConvTranspose2d(256, 128, kernel_size=(7, 7), stride=(1, 1))
(2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ELU(alpha=1.0, inplace=True)
(4): up_conv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ELU(alpha=1.0, inplace=True)
)
)
(5): up_conv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ELU(alpha=1.0, inplace=True)
)
)
(6): up_conv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ELU(alpha=1.0, inplace=True)
)
)
(7): up_conv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ELU(alpha=1.0, inplace=True)
)
)
(8): Conv2d(16, 1, kernel_size=(1, 1), stride=(1, 1))
(9): Sigmoid()
(10): View()
)
)
(head): DynamicHead(
(box_pooler): ROIPooler(
(level_poolers): ModuleList(
(0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=2, aligned=True)
(1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=2, aligned=True)
(2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=2, aligned=True)
(3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=2, aligned=True)
)
)
(mask_pooler): ROIPooler(
(level_poolers): ModuleList(
(0): ROIAlign(output_size=(28, 28), spatial_scale=0.25, sampling_ratio=2, aligned=True)
(1): ROIAlign(output_size=(28, 28), spatial_scale=0.125, sampling_ratio=2, aligned=True)
(2): ROIAlign(output_size=(28, 28), spatial_scale=0.0625, sampling_ratio=2, aligned=True)
(3): ROIAlign(output_size=(28, 28), spatial_scale=0.03125, sampling_ratio=2, aligned=True)
)
)
(head_series): ModuleList(
(0): RCNNHead(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(inst_interact): DynamicConv(
(dynamic_layer): Linear(in_features=256, out_features=32768, bias=True)
(norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(activation): ELU(alpha=1.0, inplace=True)
(out_layer): Linear(in_features=12544, out_features=256, bias=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.0, inplace=False)
(dropout2): Dropout(p=0.0, inplace=False)
(dropout3): Dropout(p=0.0, inplace=False)
(activation): ELU(alpha=1.0, inplace=True)
(cls_module): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=False)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Linear(in_features=256, out_features=256, bias=False)
(4): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(5): ELU(alpha=1.0, inplace=True)
(6): Linear(in_features=256, out_features=256, bias=False)
(7): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(8): ELU(alpha=1.0, inplace=True)
)
(reg_module): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=False)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Linear(in_features=256, out_features=256, bias=False)
(4): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(5): ELU(alpha=1.0, inplace=True)
(6): Linear(in_features=256, out_features=256, bias=False)
(7): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(8): ELU(alpha=1.0, inplace=True)
)
(mask_module): Sequential(
(0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=True)
(3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=True)
(6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1))
)
(ret_roi_layer_1): conv_block(
(conv): Sequential(
(0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=1.0, inplace=True)
)
)
(ret_roi_layer_2): conv_block(
(conv): Sequential(
(0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=1.0, inplace=True)
)
)
(class_logits): Linear(in_features=256, out_features=5, bias=True)
(bboxes_delta): Linear(in_features=256, out_features=4, bias=True)
)
(1): RCNNHead(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(inst_interact): DynamicConv(
(dynamic_layer): Linear(in_features=256, out_features=32768, bias=True)
(norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(activation): ELU(alpha=1.0, inplace=True)
(out_layer): Linear(in_features=12544, out_features=256, bias=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.0, inplace=False)
(dropout2): Dropout(p=0.0, inplace=False)
(dropout3): Dropout(p=0.0, inplace=False)
(activation): ELU(alpha=1.0, inplace=True)
(cls_module): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=False)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Linear(in_features=256, out_features=256, bias=False)
(4): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(5): ELU(alpha=1.0, inplace=True)
(6): Linear(in_features=256, out_features=256, bias=False)
(7): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(8): ELU(alpha=1.0, inplace=True)
)
(reg_module): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=False)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Linear(in_features=256, out_features=256, bias=False)
(4): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(5): ELU(alpha=1.0, inplace=True)
(6): Linear(in_features=256, out_features=256, bias=False)
(7): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(8): ELU(alpha=1.0, inplace=True)
)
(mask_module): Sequential(
(0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=True)
(3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=True)
(6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1))
)
(ret_roi_layer_1): conv_block(
(conv): Sequential(
(0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=1.0, inplace=True)
)
)
(ret_roi_layer_2): conv_block(
(conv): Sequential(
(0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=1.0, inplace=True)
)
)
(class_logits): Linear(in_features=256, out_features=5, bias=True)
(bboxes_delta): Linear(in_features=256, out_features=4, bias=True)
)
(2): RCNNHead(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(inst_interact): DynamicConv(
(dynamic_layer): Linear(in_features=256, out_features=32768, bias=True)
(norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(activation): ELU(alpha=1.0, inplace=True)
(out_layer): Linear(in_features=12544, out_features=256, bias=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.0, inplace=False)
(dropout2): Dropout(p=0.0, inplace=False)
(dropout3): Dropout(p=0.0, inplace=False)
(activation): ELU(alpha=1.0, inplace=True)
(cls_module): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=False)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Linear(in_features=256, out_features=256, bias=False)
(4): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(5): ELU(alpha=1.0, inplace=True)
(6): Linear(in_features=256, out_features=256, bias=False)
(7): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(8): ELU(alpha=1.0, inplace=True)
)
(reg_module): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=False)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Linear(in_features=256, out_features=256, bias=False)
(4): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(5): ELU(alpha=1.0, inplace=True)
(6): Linear(in_features=256, out_features=256, bias=False)
(7): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(8): ELU(alpha=1.0, inplace=True)
)
(mask_module): Sequential(
(0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=True)
(3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=True)
(6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1))
)
(ret_roi_layer_1): conv_block(
(conv): Sequential(
(0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=1.0, inplace=True)
)
)
(ret_roi_layer_2): conv_block(
(conv): Sequential(
(0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=1.0, inplace=True)
)
)
(class_logits): Linear(in_features=256, out_features=5, bias=True)
(bboxes_delta): Linear(in_features=256, out_features=4, bias=True)
)
(3): RCNNHead(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(inst_interact): DynamicConv(
(dynamic_layer): Linear(in_features=256, out_features=32768, bias=True)
(norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(activation): ELU(alpha=1.0, inplace=True)
(out_layer): Linear(in_features=12544, out_features=256, bias=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.0, inplace=False)
(dropout2): Dropout(p=0.0, inplace=False)
(dropout3): Dropout(p=0.0, inplace=False)
(activation): ELU(alpha=1.0, inplace=True)
(cls_module): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=False)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Linear(in_features=256, out_features=256, bias=False)
(4): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(5): ELU(alpha=1.0, inplace=True)
(6): Linear(in_features=256, out_features=256, bias=False)
(7): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(8): ELU(alpha=1.0, inplace=True)
)
(reg_module): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=False)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Linear(in_features=256, out_features=256, bias=False)
(4): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(5): ELU(alpha=1.0, inplace=True)
(6): Linear(in_features=256, out_features=256, bias=False)
(7): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(8): ELU(alpha=1.0, inplace=True)
)
(mask_module): Sequential(
(0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=True)
(3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=True)
(6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1))
)
(ret_roi_layer_1): conv_block(
(conv): Sequential(
(0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=1.0, inplace=True)
)
)
(ret_roi_layer_2): conv_block(
(conv): Sequential(
(0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=1.0, inplace=True)
)
)
(class_logits): Linear(in_features=256, out_features=5, bias=True)
(bboxes_delta): Linear(in_features=256, out_features=4, bias=True)
)
(4): RCNNHead(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(inst_interact): DynamicConv(
(dynamic_layer): Linear(in_features=256, out_features=32768, bias=True)
(norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(activation): ELU(alpha=1.0, inplace=True)
(out_layer): Linear(in_features=12544, out_features=256, bias=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.0, inplace=False)
(dropout2): Dropout(p=0.0, inplace=False)
(dropout3): Dropout(p=0.0, inplace=False)
(activation): ELU(alpha=1.0, inplace=True)
(cls_module): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=False)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Linear(in_features=256, out_features=256, bias=False)
(4): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(5): ELU(alpha=1.0, inplace=True)
(6): Linear(in_features=256, out_features=256, bias=False)
(7): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(8): ELU(alpha=1.0, inplace=True)
)
(reg_module): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=False)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Linear(in_features=256, out_features=256, bias=False)
(4): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(5): ELU(alpha=1.0, inplace=True)
(6): Linear(in_features=256, out_features=256, bias=False)
(7): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(8): ELU(alpha=1.0, inplace=True)
)
(mask_module): Sequential(
(0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=True)
(3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=True)
(6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1))
)
(ret_roi_layer_1): conv_block(
(conv): Sequential(
(0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=1.0, inplace=True)
)
)
(ret_roi_layer_2): conv_block(
(conv): Sequential(
(0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=1.0, inplace=True)
)
)
(class_logits): Linear(in_features=256, out_features=5, bias=True)
(bboxes_delta): Linear(in_features=256, out_features=4, bias=True)
)
(5): RCNNHead(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(inst_interact): DynamicConv(
(dynamic_layer): Linear(in_features=256, out_features=32768, bias=True)
(norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(activation): ELU(alpha=1.0, inplace=True)
(out_layer): Linear(in_features=12544, out_features=256, bias=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.0, inplace=False)
(dropout2): Dropout(p=0.0, inplace=False)
(dropout3): Dropout(p=0.0, inplace=False)
(activation): ELU(alpha=1.0, inplace=True)
(cls_module): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=False)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Linear(in_features=256, out_features=256, bias=False)
(4): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(5): ELU(alpha=1.0, inplace=True)
(6): Linear(in_features=256, out_features=256, bias=False)
(7): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(8): ELU(alpha=1.0, inplace=True)
)
(reg_module): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=False)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Linear(in_features=256, out_features=256, bias=False)
(4): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(5): ELU(alpha=1.0, inplace=True)
(6): Linear(in_features=256, out_features=256, bias=False)
(7): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(8): ELU(alpha=1.0, inplace=True)
)
(mask_module): Sequential(
(0): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=True)
(3): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=True)
(6): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1))
)
(ret_roi_layer_1): conv_block(
(conv): Sequential(
(0): Conv2d(256, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=1.0, inplace=True)
)
)
(ret_roi_layer_2): conv_block(
(conv): Sequential(
(0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ELU(alpha=1.0, inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ELU(alpha=1.0, inplace=True)
)
)
(class_logits): Linear(in_features=256, out_features=5, bias=True)
(bboxes_delta): Linear(in_features=256, out_features=4, bias=True)
)
)
)
(criterion): SetCriterion(
(matcher): HungarianMatcher()
)
)
[08/25 11:15:53 d2.data.datasets.coco]: Loading /content/drive/MyDrive/imenselmi/ISTR_TRAIN/data/result/train.json takes 6.61 seconds.
[08/25 11:15:54 d2.data.datasets.coco]: Loaded 43480 images in COCO format from /content/drive/MyDrive/imenselmi/ISTR_TRAIN/data/result/train.json
[08/25 11:15:57 d2.data.build]: Removed 0 images with no usable annotations. 43480 images left.
[08/25 11:15:59 d2.data.build]: Distribution of instances among all 5 categories:

category #instances category #instances category #instances
short_sleev.. 18359 long_sleeve.. 14566 long_sleeve.. 10492
shorts 12123 trousers 18227
total 73767

pos_embeddings.weight
WARNING [08/25 11:16:04 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
stem.fc.{bias, weight}
[08/25 11:16:04 d2.engine.train_loop]: Starting training from iteration 0
/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py:724: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the geoms property to access the constituent parts of a multi-part geometry.
for poly in cropped:
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
ERROR [08/25 11:16:05 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 273, in run_step
loss_dict = self.model(data)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/content/drive/.shortcut-targets-by-id/190HFmYfsGdKfNWeUiqnpTgh7X3m3GFmF/ISTR_TRAIN/ISTR/projects/ISTR/istr/inseg.py", line 162, in forward
src = self.backbone(images.tensor)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward
bottom_up_features = self.bottom_up(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 449, in forward
x = stage(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 201, in forward
out = self.conv3(out)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/wrappers.py", line 110, in forward
x = self.norm(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/batch_norm.py", line 53, in forward
return x * scale.to(out_dtype) + bias.to(out_dtype)
RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch)
[08/25 11:16:05 d2.engine.hooks]: Total training time: 0:00:01 (0:00:00 on hooks)
[08/25 11:16:05 d2.utils.events]: iter: 0 lr: N/A max_mem: 14075M
Traceback (most recent call last):
File "projects/ISTR/train_net.py", line 136, in
args=(args,),
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "projects/ISTR/train_net.py", line 124, in main
return trainer.train()
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 484, in train
super().train(self.start_iter, self.max_iter)
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 273, in run_step
loss_dict = self.model(data)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/content/drive/.shortcut-targets-by-id/190HFmYfsGdKfNWeUiqnpTgh7X3m3GFmF/ISTR_TRAIN/ISTR/projects/ISTR/istr/inseg.py", line 162, in forward
src = self.backbone(images.tensor)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward
bottom_up_features = self.bottom_up(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 449, in forward
x = stage(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 201, in forward
out = self.conv3(out)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/wrappers.py", line 110, in forward
x = self.norm(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/batch_norm.py", line 53, in forward
return x * scale.to(out_dtype) + bias.to(out_dtype)
RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch)

Traceback (most recent call last):
File "projects/ISTR/train_net.py", line 136, in
args=(args,),
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "projects/ISTR/train_net.py", line 124, in main
return trainer.train()
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 484, in train
super().train(self.start_iter, self.max_iter)
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 273, in run_step
loss_dict = self.model(data)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/content/drive/.shortcut-targets-by-id/190HFmYfsGdKfNWeUiqnpTgh7X3m3GFmF/ISTR_TRAIN/ISTR/projects/ISTR/istr/inseg.py", line 162, in forward
src = self.backbone(images.tensor)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward
bottom_up_features = self.bottom_up(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 449, in forward
x = stage(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 201, in forward
out = self.conv3(out)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/wrappers.py", line 110, in forward
x = self.norm(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/batch_norm.py", line 53, in forward
return x * scale.to(out_dtype) + bias.to(out_dtype)
RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch)

how can i fix this issue i trained the code in colab and locally and still the same problem always "CUDA out of memory."

how do you solve it?,my gpu is rtx 2080ti(memory 11G).