model.yaml里面主要包含backbone和head(fpn+detection head)
保存为list,每个元素为[from, number, module, args]
- from是绝对idx,当前module的输入的来源
- number是当前module堆叠的次数
- module和args用于构建当前module的结构
这些modules定义在models/common.py里面:
--- backbone ----
Conv(c_out, kernel=1, strides=1, pad=None, groups=1, act=True): conv-bn-Mish
Bottleneck(c_out, shortcut=True, groups=1, expand_ratio=0.5): standard bottleneck, conv-conv with shortcut, 注意是CSP paper定义的standard bottleneck,也就是只有两层卷积,而不是resnet那个3层卷积的版本
BottleneckCSP(c_out, n_blocks=1, shortcut=True, groups=1, expand_ratio=0.5): CSP block
----- fpn -----
SPPCSP(c_out, n_blocks=1, shortcut=False, groups=1, expand_ratio=0.5, k=(5, 9, 13)): CSP block的residual path里面再加上SPP,多尺度pooling
nn.Upsample(size=None,factor=None,mode='nearest'): 上采样
Concat(dimension=1): merge by dim
BottleneckCSP2:(c_out, n_blocks=1, shortcut=False, groups=1, expand_ratio=0.5): CSP block
---- head -----
Detect(n_cls=80, anchors=(), ch=())
以yolov4-p5为例:
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 1]], # 0
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
[-1, 1, BottleneckCSP, [64]],
[-1, 1, Conv, [128, 3, 2]], # 3-P2/4
[-1, 3, BottleneckCSP, [128]],
[-1, 1, Conv, [256, 3, 2]], # 5-P3/8
[-1, 15, BottleneckCSP, [256]],
[-1, 1, Conv, [512, 3, 2]], # 7-P4/16
[-1, 15, BottleneckCSP, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 9-P5/32
[-1, 7, BottleneckCSP, [1024]], # 10
]
Input: (b,H,W,3)
# stem: module 0-1
Conv1: (b,H,W,32)
Conv2: (b,H/2,W/2,64)
# stage1: module 2-3
BottleneckCSP: (b,H/2,W/2,64) ----> stride2, P1
downSamp Conv: (b,H/4,W/4,128)
# stage2: module 4-5
3xBottleneckCSP: (b,H/4,W/4,128) ----> stride4, P2
downSamp Conv: (b,H/8,W/8,256)
# stage3: module 6-7
15xBottleneckCSP: (b,H/8,W/8,256) ----> stride8, P3
downSamp Conv: (b,H/16,W/16,512)
# stage4: module 8-9
15xBottleneckCSP: (b,H/16,W/16,512) ----> stride16, P4
downSamp Conv: (b,H/32,W/32,1024)
# stage5: module 10
7xBottleneckCSP: (b,H/32,W/32,1024) ----> stride32, P5
fpn:
[[-1, 1, SPPCSP, [512]], # 11
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[8, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSP2, [256]], # 16
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[6, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSP2, [128]], # 21
[-1, 1, Conv, [256, 3, 1]],
[-2, 1, Conv, [256, 3, 2]],
[[-1, 16], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSP2, [256]], # 25
[-1, 1, Conv, [512, 3, 1]],
[-2, 1, Conv, [512, 3, 2]],
[[-1, 11], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSP2, [512]], # 29
[-1, 1, Conv, [1024, 3, 1]],
]
P3(256) ------ conv(128) --------- cat(256) - 3xCSP(128) - conv(256)
/ |
ConvUP(128) s2conv(256)
/ |
P4(512) --- conv(256) -- cat(512) - 3xCSP(256) - cat(512) - 3xCSP(256) - conv(512)
/ |
ConvUp(256) s2conv(512)
/ |
P5(1024) ------ SPPCSP(512) ------------------------- cat(1024) - 3xCSP(512) - conv(1024)
detetion head:
[[22,26,30], 1, Detect, [n_cls, anchors]]
Detect([P3,P4,P5])
就是一层卷积:1x1 conv with bias + sigmoid, (b,h,w,a,c+1+4)
box_offsets
xy_offset: y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]
wh_offset: y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]
* wh这个好理解,sigmoid出来[0,1],*2**2就变换到[0,4]了,可以对anchor box缩放/扩大
* xy这个,sigmoid出来[0,1],*2-0.5变换到[-0.5,1.5],这个作为grid center的偏移量,相比较于原始的[0,1]激活区间更大一点,有助于偏移量比较大的object的收敛
* ref: https://github.com/WongKinYiu/ScaledYOLOv4/issues/90#
* ref_origin: https://github.com/AlexeyAB/darknet/issues/3293