pytorch_model_integration

The project is based on pytorch and integrates the current mainstream network architecture, including VGGnet, ResNet, DenseNet, MobileNet and DarkNet (YOLOv2 and YOLOv3).

This project will fully comply with the relevant details mentioned in the paper. Since the structural details in some papers are incomplete, we have added some personal insights. The input size of all networks is uniformly set to (224, 224, 3) (H, W, C).

Networks Result

Model	Params/Million	FLOPs/G	Time_cast/ms	Top-1	Top-5
--- 2015 ---
Vgg11	9.738984	15.02879	205.59	70.4	89.6
Vgg13	9.92388	22.45644	324.13	71.3	90.1
Vgg16	15.236136	30.78787	397.33	74.4	91.9
Vgg19	20.548392	39.11929	451.11	74.5	92.0
--- 2016 ---
ResNet18	11.693736	3.65921	86.56
ResNet34	21.801896	7.36109	123.07	75.81	92.6
ResNet50	25.557032	8.27887	293.62	77.15	93.29
ResNet101	44.54916	15.71355	413.51	78.25	93.95
ResNet152	60.192808	23.15064	573.09	78.57	94.29
PreActResNet18	11.690792	3.65840	86.12
PreActResNet34	21.798952	7.36029	142.51
PreActResNet50	25.545256	8.27566	296.39
PreActResNet101	44.537384	15.71034	418.37
PreActResNet152	60.181032	23.14743	578.81	78.90	94.50
DarkNet19(YOLOv2)	8.01556	10.90831	139.21
--- 2017 ---
DenseNet121(k=32)	7.978734	5.69836	286.45
DenseNet169(k=32)	14.149358	6.75643	375.47
DenseNet201(k=32)	20.013806	8.63084	486.14
DenseNet264(k=32)	33.337582	11.57003	689.63
DenseNet161(k=48)	28.680814	15.50790	708.36
DPN92	36.779704	12.77985	366.11	79.30	94.60
DPN98	60.21588	22.92897	573.04	79.80	94.80
ResNeXt50_2x40d	25.425	8.29756	364.24	77.00
ResNeXt50_4x24d	25.292968	8.37150	416.01	77.40
ResNeXt50_8x14d	25.603016	8.58994	444.33	77.70
ResNeXt50_32x4d	25.028904	8.51937	460.20	77.80
ResNeXt101_2x40d	44.456296	15.75783	640.83	78.3
ResNeXt101_4x24d	44.363432	15.84712	627.48	78.6
ResNeXt101_8x14d	45.104328	16.23445	870.31	78.7
ResNeXt101_32x4d	44.177704	16.02570	952.88	78.8
MobileNet	4.231976	1.14757	100.45	70.60
SqueezeNet	1.2524	1.69362	90.97	57.50	80.30
SqueezeNet + Simple Bypass	1.2524	1.69550	96.82	60.40	82.50
SqueezeNet + Complex Bypass	1.594928	2.40896	130.98	58.80	82.00
--- 2018 ---
PeleeNet	4.51988	4.96656	237.18	72.6	90.6
1.0-SqNxt-23	0.690824	0.48130	69.93	59.05	82.60
1.0-SqNxt-23v5	0.909704	0.47743	58.40	59.24	82.41
2.0-SqNxt-23	2.2474	1.12928	111.89	67.18	88.17
2.0-SqNxt-23v5	3.11524	1.12155	93.54	67.44	88.20
MobileNetV2	3.56468	0.66214	138.15	74.07
DarkNet53(YOLOv3)	41.609928	14.25625	275.50
DLA-34	15.784869	2.27950	70.17
DLA-46-C	1.310885	0.40895	40.29	64.9	86.7
DLA-60	22.335141	2.93399	110.80
DLA-102	33.732773	4.42848	154.27
DLA-169	53.990053	6.65083	230.39
DLA-X-46-C	1.077925	0.37765	44.74	66.0	87.0
DLA-X-60-C	1.337765	0.40313	50.84	68.0	88.4
DLA-X-60	17.650853	2.39033	131.93
DLA-X-102	26.773157	3.58778	164.93
IGCV3-D (0.7)	2.490294	0.31910	165.14	68.45
IGCV3-D (1.0)	3.491688	0.60653	263.80	72.20
IGCV3-D (1.4)	6.015164	1.11491	318.40	74.70
--- 2019 ---
EfficientNet-B0	5.288548	0.01604	186.61	76.30	93.20
EfficientNet-B1	7.794184	0.02124	266.05	78.80	94.40
EfficientNet-B2	9.109994	0.02240	277.94	79.80	94.90
EfficientNet-B3	12.233232	0.02905	376.24	81.10	95.50
EfficientNet-B4	19.341616	0.03762	513.91	82.60	96.30
EfficientNet-B5	30.389784	0.05086	721.95	83.30	96.70
EfficientNet-B6	43.040704	0.06443	1062.64	84.00	96.90
EfficientNet-B7	66.34796	0.08516	1520.88	84.40	97.10

GoogleNet Inception V1-V4

Model	Params/Million	FLOPs/G	Time_cast/ms	Top-1	Top-5
--- 2014 ---
GoogleNet V1	6.998552	3.20387	85.95
GoogleNet V1 (LRN)	6.998552	3.20387	192.64	71.00	90.80
GoogleNet V1 (Bn)	7.013112	3.21032	139.42	73.20
--- 2015 ---
GoogleNet V2	11.204936	4.08437	127.71	76.60
GoogleNet V3	23.834568	7.60887	208.01	78.80	94.40
--- 2016 ---
GoogleNet V4	42.679816	12.31977	324.36	80.00	95.10

Note: GoogleNet V1 does not include the Bn layer, but after the first two layers of convolution, LocalResponseNorm is added, this operation will increase the calculation time of the model. So we found that GoogleNet V1 is slower than GoogleNet V1_Bn.

For Time_cast, we set the input size: (4, 3, 224, 224), and then test multiple rounds of averaging (time is susceptible to interference from CPU operating state).

ImageNet数据准备

Download

http://www.image-net.org/challenges/LSVRC/2012/downloads

我们需要的是训练集与验证集（等同测试集），一般论文当中只展示验证集上的结果（Top-1 & Top-5）。

Development kit (Task 1 & 2). 2.5MB. (这个并没有用到)

Training images (Task 1 & 2). 138GB. MD5: 1d675b47d978889d74fa0da5fadfb00e

Validation images (all tasks). 6.3GB. MD5: 29b22e2961454d5413ddabcf34fc5622

安装

方法一：https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh

方法二：

解压下载的数据文件，这可能需要一段时间

tar xvf ILSVRC2012_img_train.tar -C ./train

tar xvf ILSVRC2012_img_val.tar -C ./val

对于train数据，解压后是1000个tar文件，需要再次解压，解压脚本dataset/unzip.sh如下

dir=/data/srd/data/Image/ImageNet/train

for x in `ls $dir/*tar`
do
    filename=`basename $x .tar`
    mkdir $dir/$filename
    tar -xvf $x -C $dir/$filename
done

rm *.tar

注：将其中的'dir'修改为自己的文件目录

然后运行

sh unzip.sh

对于val数据，解压之后是50000张图片，我们需要将每一个类的图片整理到一起，与train一致。将项目dataset/valprep.sh脚本放到val文件夹下运行

sh valprep.sh

下载好的训练集下的每个文件夹是一类图片，文件夹名对应的标签在下载好的Development kit的标签文件meta.mat中，这是一个matlab文件，scipy.io.loadmat可以读取文件内容，验证集下是50000张图片，每张图片对应的标签在ILSVRC2012_validation_ground_truth.txt中。

数据增强：取图片时随机取，然后将图片放缩为短边为256，然后再随机裁剪224x224的图片，再把每个通道减去相应通道的平均值，随机左右翻转。

CROP-1, CROP-5 and CROP-10 test methods for ImageNet validation set.

Due to the existence of a fully connected layers, we neeed to limit the size of the images in the input network. Set the size of the image input neural network to 224x224, but the size of the image in the test set is not fixed. It is difficult to completely cover the information of the target object in the image by only the center clipping method, so we crop the image at multiple locations.

One-crop of an image is created by cropping one 224 × 224 regions from the center of a 256 × 256 image; Five-crop is five 224 × 224 sized image regions cropped from top left, top right, bottom left, bottom right and center of original image; Ten-crop is horizontally flipping each cropped region base on the results of five-crop.

Use Pytorch,