official repo: https://github.com/xingyizhou/CenterNet/src
input: [h,w,3]
output: s4 heatmap+offsets+size [h//s, w//s, c+2+2]
backs: hourglass/resnet/DLA
heads: independent 3x3 conv + 1x1 conv
loss:
* det loss: grid focal loss, with penalty reduction on negatives
* offset loss: l1, offh = h/s-h//s, 中心点相对grid原点的offset
* size loss: l1, 直接回归heatmap level的wh而不是offset(e.g. 如果input是512,那wh回归范围就是[0,128]), factor=0.1
个人感觉,centerNet和anchor-based的formulation其实是一样的,
- center的回归对标confidence的回归,区别在于高斯/[0,1]/[0,-1,1]
- offset的回归对标bounding box中心点的回归,完全一样
- size的回归变成了raw pixel,不再基于anchor
- hourglass结构就是fpn,级联的hourglass可以对标bi-fpn
- 多尺度变成了单一大resolution特征图,也可以用多尺度预测,需要加NMS
MAP on COCO without TTA:
ResNet-101: 34.6 AP at 45 FPS
DLA-34: 37.4% AP at 52 FPS
Hourglass-104: 45.1% AP at 1.4 FPS
(加上test time multi-scale能涨5个点,说明single-scale是瓶颈)
backbones
* hourglass-104
reference: https://github.com/AmberzzZZ/backbones/tree/master/hourglass
stacked: 跟cornetNet里的结构一样,一个降采样的hourglass加上两个镜像对称的hourglass,参数量贼大
intermediate supervision:
- centerNet论文里没提
- cornerNet论文里提了:用了但是不作为下一级输入
- 实验发现intermediate supervision非常重要!!!收敛快,而且heatmap更准
* resnet-101
decoder design:
- channel数: [256,128,64]
- 3x3 deformable conv + transpose conv(initialized as bilinear interpolation)
* DLA-34
decoder design:
- 与encoder镜像对称的结构
- replace the original conv with 3x3 deformable conv at every upsampling layer
- add more skip connections from each level
- 与resnet back的主要不同是每个feature level融合了前一层的feature
* one attemp
centerNext: 因为发现hourglass中大量使用了residual block,考虑换成resNext block,提升计算效率
参数量瞬间从33million下降到15million
se-centerNext: one step further, add se-blocks in residuals
compose augmentation:
label加权的不行(mixup),因为新定义的loss中positives必须prob为1,可以考虑cutmix和mosaic
所以label smoothing也用不成
head
3x3 conv(256) + 1x1 conv
test time decoder(from points to bnd boxes)
for each heatmap channel:
* take top 100 8-connected-neighbor-peaks: inplemented by 3x3 max pooling
* 将peaks的xywh通道转化成box value
* 论文声称无需任何NMS/其他后处理,但是top100本身就会带来冗余框,可能不影响precision,但是假阳就是很多啊??
training details
* input size: 512x512x3
* output: 128x128
* aug: random flip & random scaling(0.6-1.3) & cropping & color jittering
* Adam: batch size=128, lr=5e-4(drop by x10 at ep90 & ep120), 140 epochs,
centerNet2
official repo: https://github.com/xingyizhou/CenterNet2