AI Model Zoo |
Introduction
This repository includes optimized deep learning models to speed up the deployment of deep learning inference on Xilinx™ platforms. These models cover different applications, including but not limited to ADAS/AD, video surveillance, robotics, data center, etc. You can get started with these free pre-trained models to enjoy the benefits of deep learning acceleration.
Model Information
The following table includes comprehensive information about each model, including application, framework, training and validation dataset, backbone, input size, computation as well as float and fixed-point precision.
Click here to view details
No. | Application | Model | Name | Framework | Backbone | Input Size | OPS per image | Training Set | Val Set | Float (Top1, Top5)/ mAP/mIoU | Fixed (Top1, Top5)/mAP/mIoU |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Image Classification | resnet50 | cf_resnet50_imagenet_224_224_7.7G | caffe | resnet50 | 224*224 | 7.7G | ImageNet Train | ImageNet Validataion | 0.74828/0.92135 | 0.7338/0.9130 |
2 | Image Classification | Inception_v1 | cf_inceptionv1_imagenet_224_224_3.16G | caffe | inception_v1 | 224*224 | 3.16G | ImageNet Train | ImageNet Validataion | 0.689/0.897 | 0.69882/0.894122 |
3 | Image Classification | Inception_v2 | cf_inceptionv2_imagenet_224_224_4G | caffe | bn-inception | 224*224 | 4G | ImageNet Train | ImageNet Validataion | 0.7283/0.9109 | 0.7170/0.9033 |
4 | Image Classification | Inception_v3 | cf_inceptionv3_imagenet_299_299_11.4G | caffe | inception_v3 | 299*299 | 11.4G | ImageNet Train | ImageNet Validataion | 0.77058/0.93326 | 0.76264/0.930322 |
5 | Image Classification | mobileNet_v2 | cf_mobilenetv2_imagenet_224_224_0.59G | caffe | MobileNet_v2 | 224*224 | 608M | ImageNet Train | ImageNet Validataion | 0.6649/0.872362 | 0.635219/0.850701 |
6 | Image Classification | tf_resnet50 | tf_resnet50_imagenet_224_224_6.97G | tensorflow | resnet50 | 224*224 | 6.97G | ImageNet Train | ImageNet Validataion | 0.7520/0.9219 | 0.7420/0.9209 |
7 | Image Classification | tf_inception_v1 | tf_inceptionv1_imagenet_224_224_3G | tensorflow | inception_v1 | 224*224 | 3.0G | ImageNet Train | ImageNet Validataion | 0.6976/0.8963 | 0.6786/0.8885 |
8 | Image Classification | tf_mobilenet_v2 | tf_mobilenetv2_imagenet_224_224_1.17G | tensorflow | MobileNet_v2 | 224*224 | 1.17G | ImageNet Train | ImageNet Validataion | 0.7487/0.9250 | 0.2720/- |
9 | ADAS Vehicle Detection | ssd_adas_pruned_0.95 | cf_ssdadas_bdd_360_480_0.95_6.3G | caffe | VGG-16 | 360*480 | 6.3G | bdd100k + private data | bdd100k + private data | 0.426 | 0.424 |
10 | ADAS Pedstrain Detection | ssd_pedestrain_pruned_0.97 | cf_ssdpedestrian_coco_360_640_0.97_5.9G | caffe | VGG-bn-16 | 360*640 | 5.9G | coco2014_train_person and crowndhuman | coco2014_val_person | 0.5899 | 0.585 |
11 | Traffic Detection | ssd_traffic_pruned_0.9 | cf_ssdtraffic_360_480_0.9_11.6G | caffe | VGG-16 | 360*480 | 11.6G | private data | private data | 0.602 | 0.588 |
12 | Object Detection | ssd_mobilnet_v2 | cf_ssdmobilenetv2_bdd_360_480_6.57G | caffe | MobileNet_v2 | 360*480 | 6.57G | bdd100k train | bdd100k val | 0.3186 | 0.3019 |
13 | Object Detection | tf_ssd_voc | tf_ssd_voc_300_300_64.81G | tensorflow | VGG-bn-16 | 300*300 | 64.81G | voc07+12_trainval | voc07_test | 0.7942(11 points) | 0.7882(11 points) |
14 | Face Detection | densebox_320_320 | cf_densebox_wider_320_320_0.49G | caffe | VGG-16 | 320*320 | 0.49G | wider_face | FDDB | 0.8818 | 0.8768 |
15 | Face Detection | densebox_360_640 | cf_densebox_wider_360_640_1.11G | caffe | VGG-16 | 360*640 | 1.11G | wider_face | FDDB | 0.8909 | 0.8909 |
16 | ADAS Detection | yolov3_adas_prune_0.9 | dk_yolov3_cityscapes_256_512_0.9_5.46G | darknet | darknet-53 | 256*512 | 5.46G | cityscape train | cityscape val | 55.20% | 53.00% |
17 | Object Detection | yolov3_voc | dk_yolov3_voc_416_416_65.42G | darknet | darknet-53 | 416*416 | 65.42G | voc07+12_trainval | voc07_test | 82.4%(MaxIntegral) | 81.5%(MaxIntegral) |
18 | Object Detection | tf_yolov3_voc | tf_yolov3_voc_416_416_65.63G | tensorflow | darknet-53 | 416*416 | 65.63G | voc07+12_trainval | voc07_test | 78.46%(11 points) | 77.38%(11 points) |
19 | Object Detection | refinedet_pruned_0.8 | cf_refinedet_coco_360_480_0.8_25G | caffe | VGG-bn-16 | 360*480 | 25G | coco2014_train_person | coco2014_val_person | 67.68% | 67.47% |
20 | Object Detection | refinedet_pruned_0.92 | cf_refinedet_coco_360_480_0.92_10.10G | caffe | VGG-bn-16 | 360*480 | 10.10G | coco2014_train_person | coco2014_val_person | 64.60% | 64.50% |
21 | Object Detection | refinedet_pruned_0.96 | cf_refinedet_coco_360_480_0.96_5.08G | caffe | VGG-bn-16 | 360*480 | 5.08G | coco2014_train_person | coco2014_val_person | 60.89% | 60.65% |
22 | ADAS Segmentation | FPN | cf_fpn_cityscapes_256_512_8.9G | caffe | Google_v1_BN | 256*512 | 8.9G | Cityscapes gtFineTrain(2975) | Cityscapes Val(500) | 0.5669 | 0.5645 |
23 | ADAS Lane Detection | VPGnet_pruned_0.99 | cf_VPGnet_caltechlane_480_640_0.99_2.5G | caffe | VGG | 480*640 | 2.5G | caltech-lanes-train-dataset | caltech lane | 88.639%(F1-score) | 87%(F1-score) |
24 | Pose Estimation | SP-net | cf_SPnet_aichallenger_224_128_0.54G | caffe | Google_v1_BN | 128*224 | 548.6M | ai_challenger | ai_challenger | 88.2%(PCKh0.5) | 87.86%(PCKh0.5) |
25 | Pose Estimation | Openpose_pruned_0.3 | cf_openpose_aichallenger_368_368_0.3_189.7G | caffe | VGG | 368*368 | 49.88G | ai_challenger | ai_challenger | 0.45067(OKs) | 0.44287(Oks) |
26 | Object Detection | yolov2_voc | dk_yolov2_voc_448_448_34G | darknet | darknet-19 | 448*448 | 34G | voc07+12_trainval | voc07_test | 78.45%(MaxIntegral) | 77.39%(MaxIntegral) |
27 | Object Detection | yolov2_voc_pruned_0.66 | dk_yolov2_voc_448_448_0.66_11.56G | darknet | darknet-19 | 448*448 | 11.56G | voc07+12_trainval | voc07_test | 77%(MaxIntegral) | 76%(MaxIntegral) |
28 | Object Detection | yolov2_voc_pruned_0.71 | dk_yolov2_voc_448_448_0.71_9.86G | darknet | darknet-19 | 448*448 | 9.86G | voc07+12_trainval | voc07_test | 76.7%(MaxIntegral) | 75.3%(MaxIntegral) |
29 | Object Detection | yolov2_voc_pruned_0.77 | dk_yolov2_voc_448_448_0.77_7.82G | darknet | darknet-19 | 448*448 | 7.82G | voc07+12_trainval | voc07_test | 75.76%(MaxIntegral) | 74.6%(MaxIntegral) |
30 | Image Classifiction | Inception-v4 | cf_inceptionv4_imagenet_299_299_24.5G | caffe | inception | 299*299 | 24.5G | ImageNet Train | ImageNet Validataion | 79.59%/94.70% | 78.99%/94.45% |
31 | Image Classifiction | SqueezeNet | cf_squeeze_imagenet_227_227_0.76G | caffe | squeezenet | 227*227 | 0.76G | ImageNet Train | ImageNet Validataion | 54.64%/78.20% | 50.69%/77.01% |
32 | Face Recognition | face_landmark | cf_landmark_celeba_96_72_0.14G | caffe | lenet | 96*72 | 0.14G | celebA | processed helen | 0.03704(MAE) | 0.03692(MAE) |
33 | Re-identification | reid | cf_reid_marketcuhk_160_80_0.95G | caffe | resnet18 | 160*80 | 0.95G | Market1501+CUHK03 | Market1501 | 78.00% | 77.60% |
34 | Object Detection | yolov3_bdd | cf_yolov3_bdd_288_512_53.7G | caffe | darknet-53 | 288*512 | 53.7G | bdd100k | bdd100k | 50.60% | 49.14% |
35 | Image Classifiction | tf_mobilenet_v1 | tf_mobilenetv1_imagenet_224_224_1.14G | tensorflow | MobileNet_v1 | 224*224 | 1.14G | ImageNet Train | ImageNet Validataion | 71.06%/89.72% | 67.87%/87.67% |
36 | Image Classifiction | resnet18 | cf_resnet18_imagenet_224_224_3.65G | caffe | resnet18 | 224*224 | 3.65G | ImageNet Train | ImageNet Validataion | 68.44%/88.64% | 66.94%/88.25% |
37 | Image Classifiction | resnet18_wide | tf_resnet18_imagenet_224_224_28G | tensorflow | resnet18 | 224*224 | 28G | ImageNet Train | ImageNet Validataion | 68.91%/88.63% | 69.86%/88.96% |
Naming Rules
Model name: F_M_D_H_W_(P)_C
F
specifies training framework:cf
is Caffe,tf
is Tensorflow,dk
is Darknet,pt
is PyTorchM
specifies the modelD
specifies the datasetH
specifies the height of input dataW
specifies the width of input dataP
specifies the pruning ratio, it means how much computation is reduced. It is optional depending on whether the model is pruned.C
specifies the computation of the model: how many Gops per image
For example, cf_refinedet_coco_480_360_0.8_25G
is a RefineDet
model trained with Caffe
using COCO
dataset, input data size is 480*360
, 80%
pruned, and the computation per image is 25Gops
.
Model Download
The following table lists various models, download link and MD5 checksum for the zip file of each model.
Note: To download all the models, visit all_models.zip.
Click here to view details
If you are a:
- Linux user, use the
get_model.sh
script to download all the models. - Windows user, use the download link listed in the following table to download a model.
Model Directory Structure
Download and extract the model archive to your working area on the local hard disk. For details on the various models, their download link and MD5 checksum for the zip file of each model, see Model Download.
Caffe Model Directory Structure
For a Caffe model, you should see the following directory structure:
├── labelmap.prototxt # Contains information of the detection class for some models
│ such as SSD, RefineDet.
├── readme.md # Contains the environment requirement and data preprocess information.
│ Refer this file to know more about creating `float.prototxt` by adding
│ datalayer to `test.prototxt` in the `float` directory.
├── deploy
│ ├── deploy.caffemodel # Input to the compiler. The same with deploy.caffemodel in the `fix` directory.
│ └── deploy.prototxt # Input to the compiler. The modified prototxt based on deploy.prototxt
│ in the `fix` directory, which removes unnecessary or unsupported layers
│ for compilation.
├── fix
│ ├── deploy.caffemodel # Quantized weights, the output of decent_q without modification.
│ ├── deploy.prototxt # Quantized prototxt, the output of decent_q without modification.
│ ├── fix_test.prototxt # Used to run evaluation with fix_train_test.caffemodel on GPU
│ │ using python test code released in near future. Some models
│ │ don't have this file if they are converted from Darknet (Yolov2, Yolov3),
│ │ Pytorch (ReID) or there is no Caffe Test (Densebox).
│ ├── fix_train_test.caffemodel # Quantized weights can be used for fixed-point training and evaluation.
│ └── fix_train_test.prototxt # Used for fixed-point training and testing with fix_train_test.caffemodel
│ on GPU when datalayer modified to user's data path.
└── float
├── float.caffemodel # Trained float-point weights.
├── float.prototxt # Modified test.prototxt as the input to decent_q along
│ with float.caffemodel. decent_q is Xilinx quantization tool
│ which quantizes float-point to fixed-point model with minimal
│ accuracy loss.
├── test.prototxt # Used to run evaluation with python test codes released in near future.
└── trainval.prorotxt # Used for training and testing with caffe train/test command
when datalayer modified to user's data path. Some models don't
have this file if they are converted from Darknet (Yolov2, Yolov3),
Pytorch (ReID) or there is no Caffe Test (Densebox).
Note: For more information on decent_q
, see the DNNDK User Guide.
Tensorflow Model Directory Structure
For a Tensorflow model, you should see the following directory structure:
├── input_fn.py # Python function to read images in calibration dataset and do data preprocess.
├── readme.md # Contains the environment requirement, the input and output nodes as well as
│ the data preprocess and postprocess information.
├── fix
│ ├── deploy.model.pb # Quantized model for the compiler (extended Tensorflow format).
│ └── quantize_eval_model.pb # Quantized model for evaluation.
└── float
└── frozen.pb # Float-point frozen model, the input to the `decent_q`.
Model Performance
All the models in the Model Zoo have been deployed on Xilinx hardware with DNNDK™ (Deep Neural Network Development Kit) and Xilinx AI SDK. The performance number including end-to-end throughput and latency for each model on various boards with different DPU configurations are listed in the following sections.
For more information about DPU, see DPU IP Product Guide.
Note: The model performance number listed in the following sections is generated with DNNDK v3.1 and Xilinx AI SDK v2.0.x. For each board, a different DPU configuration is used. DNNDK and Xilinx AI SDK can be downloaded for free from https://www.xilinx.com/products/design-tools/ai-inference/ai-developer-hub.html.
Performance on ZCU102 (0432055-04)
Click here to view details
The following table lists the performance number including end-to-end throughput and latency for each model on the ZCU102 (0432055-04)
board with a 3 * B4096 @ 287MHz V1.4.0
DPU configuration:
No. | Model | Name | E2E latency (ms) Thread num =1 | E2E throughput -fps(Single Thread) | E2E throughput -fps(Multi Thread) |
---|---|---|---|---|---|
1 | resnet50 | cf_resnet50_imagenet_224_224_7.7G | 12.85 | 77.8 | 179.3 |
2 | Inception_v1 | cf_inceptionv1_imagenet_224_224_3.16G | 5.47 | 182.683 | 485.533 |
3 | Inception_v2 | cf_inceptionv2_imagenet_224_224_4G | 6.76 | 147.933 | 373.267 |
4 | Inception_v3 | cf_inceptionv3_imagenet_299_299_11.4G | 17 | 58.8333 | 155.4 |
5 | mobileNet_v2 | cf_mobilenetv2_imagenet_224_224_0.59G | 4.09 | 244.617 | 638.067 |
6 | tf_resnet50 | tf_resnet50_imagenet_224_224_6.97G | 11.94 | 83.7833 | 191.417 |
7 | tf_inception_v1 | tf_inceptionv1_imagenet_224_224_3G | 6.72 | 148.867 | 358.283 |
8 | tf_mobilenet_v2 | tf_mobilenetv2_imagenet_224_224_1.17G | 5.46 | 183.117 | 458.65 |
9 | ssd_adas_pruned_0.95 | cf_ssdadas_bdd_360_480_0.95_6.3G | 11.33 | 88.2667 | 320.5 |
10 | ssd_pedestrain_pruned_0.97 | cf_ssdpedestrian_coco_360_640_0.97_5.9G | 12.96 | 77.1833 | 314.717 |
11 | ssd_traffic_pruned_0.9 | cf_ssdtraffic_360_480_0.9_11.6G | 17.49 | 57.1833 | 218.183 |
12 | ssd_mobilnet_v2 | cf_ssdmobilenetv2_bdd_360_480_6.57G | 24.21 | 41.3 | 141.233 |
13 | tf_ssd_voc | tf_ssd_voc_300_300_64.81G | 69.28 | 14.4333 | 46.7833 |
14 | densebox_320_320 | cf_densebox_wider_320_320_0.49G | 2.43 | 412.183 | 1416.63 |
15 | densebox_360_640 | cf_densebox_wider_360_640_1.11G | 5.01 | 199.717 | 719.75 |
16 | yolov3_adas_prune_0.9 | dk_yolov3_cityscapes_256_512_0.9_5.46G | 11.09 | 90.1667 | 259.65 |
17 | yolov3_voc | dk_yolov3_voc_416_416_65.42G | 70.51 | 14.1833 | 44.4 |
18 | tf_yolov3_voc | tf_yolov3_voc_416_416_65.63G | 70.75 | 14.1333 | 44.0167 |
19 | refinedet_pruned_0.8 | cf_refinedet_coco_360_480_0.8_25G | 29.91 | 33.4333 | 109.067 |
20 | refinedet_pruned_0.92 | cf_refinedet_coco_360_480_0.92_10.10G | 15.39 | 64.9667 | 216.317 |
21 | refinedet_pruned_0.96 | cf_refinedet_coco_360_480_0.96_5.08G | 11.04 | 90.5833 | 312 |
22 | FPN | cf_fpn_cityscapes_256_512_8.9G | 16.58 | 60.3 | 203.867 |
23 | VPGnet_pruned_0.99 | cf_VPGnet_caltechlane_480_640_0.99_2.5G | 9.44 | 105.9 | 424.667 |
24 | SP-net | cf_SPnet_aichallenger_224_128_0.54G | 1.73 | 579.067 | 1620.67 |
25 | Openpose_pruned_0.3 | cf_openpose_aichallenger_368_368_0.3_189.7G | 279.07 | 3.58333 | 16.55 |
26 | yolov2_voc | dk_yolov2_voc_448_448_34G | 39.76 | 25.15 | 86.35 |
27 | yolov2_voc_pruned_0.66 | dk_yolov2_voc_448_448_0.66_11.56G | 18.42 | 54.2833 | 211.217 |
28 | yolov2_voc_pruned_0.71 | dk_yolov2_voc_448_448_0.71_9.86G | 16.42 | 60.9167 | 242.433 |
29 | yolov2_voc_pruned_0.77 | dk_yolov2_voc_448_448_0.77_7.82G | 14.46 | 69.1667 | 286.733 |
30 | Inception-v4 | cf_inceptionv4_imagenet_299_299_24.5G | 34.25 | 29.2 | 84.25 |
31 | SqueezeNet | cf_squeeze_imagenet_227_227_0.76G | 3.6 | 277.65 | 1080.77 |
32 | face_landmark | cf_landmark_celeba_96_72_0.14G | 1.13 | 885.033 | 1623.3 |
33 | reid | cf_reid_marketcuhk_160_80_0.95G | 2.67 | 375 | 773.533 |
34 | yolov3_bdd | cf_yolov3_bdd_288_512_53.7G | 73.89 | 13.5333 | 42.8833 |
35 | tf_mobilenet_v1 | tf_mobilenetv1_imagenet_224_224_1.14G | 3.2 | 312.067 | 875.967 |
36 | resnet18 | cf_resnet18_imagenet_224_224_3.65G | 5.1 | 195.95 | 524.433 |
37 | resnet18_wide | tf_resnet18_imagenet_224_224_28G | 33.28 | 30.05 | 83.4167 |
Performance on ZCU102 (0432055-05)
Click here to view details
The following table lists the performance number including end-to-end throughput and latency for each model on the ZCU102 (0432055-05)
board with a 3 * B4096 @ 287MHz V1.4.0
DPU configuration:
No. | Model | Name | E2E latency (ms) Thread num =1 | E2E throughput -fps(Single Thread) | E2E throughput -fps(Multi Thread) |
---|---|---|---|---|---|
1 | resnet50 | cf_resnet50_imagenet_224_224_7.7G | 12.98 | 77.0167 | 163.417 |
2 | Inception_v1 | cf_inceptionv1_imagenet_224_224_3.16G | 5.51 | 181.65 | 452.4 |
3 | Inception_v2 | cf_inceptionv2_imagenet_224_224_4G | 6.8 | 147 | 345.7 |
4 | Inception_v3 | cf_inceptionv3_imagenet_299_299_11.4G | 17.11 | 58.45 | 144.9 |
5 | mobileNet_v2 | cf_mobilenetv2_imagenet_224_224_0.59G | 4.13 | 241.9 | 587.25 |
6 | tf_resnet50 | tf_resnet50_imagenet_224_224_6.97G | 12.07 | 82.85 | 173.267 |
7 | tf_inception_v1 | tf_inceptionv1_imagenet_224_224_3G | 6.77 | 147.65 | 330.583 |
8 | tf_mobilenet_v2 | tf_mobilenetv2_imagenet_224_224_1.17G | 5.52 | 181.067 | 422.15 |
9 | ssd_adas_pruned_0.95 | cf_ssdadas_bdd_360_480_0.95_6.3G | 11.32 | 88.3167 | 306.267 |
10 | ssd_pedestrain_pruned_0.97 | cf_ssdpedestrian_coco_360_640_0.97_5.9G | 12.96 | 77.1667 | 309.4 |
11 | ssd_traffic_pruned_0.9 | cf_ssdtraffic_360_480_0.9_11.6G | 17.48 | 57.2 | 216 |
12 | ssd_mobilnet_v2 | cf_ssdmobilenetv2_bdd_360_480_6.57G | 24.67 | 40.5333 | 124.733 |
13 | tf_ssd_voc | tf_ssd_voc_300_300_64.81G | 69.61 | 14.3667 | 46.9833 |
14 | densebox_320_320 | cf_densebox_wider_320_320_0.49G | 2.46 | 406.2 | 1311.8 |
15 | densebox_360_640 | cf_densebox_wider_360_640_1.11G | 5.04 | 198.533 | 645.567 |
16 | yolov3_adas_prune_0.9 | dk_yolov3_cityscapes_256_512_0.9_5.46G | 11.16 | 89.6333 | 239.667 |
17 | yolov3_voc | dk_yolov3_voc_416_416_65.42G | 70.67 | 14.15 | 43.6167 |
18 | tf_yolov3_voc | tf_yolov3_voc_416_416_65.63G | 71.01 | 14.0833 | 43.0833 |
19 | refinedet_pruned_0.8 | cf_refinedet_coco_360_480_0.8_25G | 29.94 | 33.4 | 107.533 |
20 | refinedet_pruned_0.92 | cf_refinedet_coco_360_480_0.92_10.10G | 15.48 | 64.6167 | 210.817 |
21 | refinedet_pruned_0.96 | cf_refinedet_coco_360_480_0.96_5.08G | 11.06 | 90.45 | 298.217 |
22 | FPN | cf_fpn_cityscapes_256_512_8.9G | 16.68 | 59.95 | 188.533 |
23 | VPGnet_pruned_0.99 | cf_VPGnet_caltechlane_480_640_0.99_2.5G | 9.39 | 106.45 | 396.85 |
24 | SP-net | cf_SPnet_aichallenger_224_128_0.54G | 1.74 | 574.833 | 1516.78 |
25 | Openpose_pruned_0.3 | cf_openpose_aichallenger_368_368_0.3_189.7G | 279.07 | 3.58333 | 16.6333 |
26 | yolov2_voc | dk_yolov2_voc_448_448_34G | 39.84 | 25.1 | 84.5667 |
27 | yolov2_voc_pruned_0.66 | dk_yolov2_voc_448_448_0.66_11.56G | 18.44 | 54.2333 | 206.067 |
28 | yolov2_voc_pruned_0.71 | dk_yolov2_voc_448_448_0.71_9.86G | 16.44 | 60.8167 | 238.017 |
29 | yolov2_voc_pruned_0.77 | dk_yolov2_voc_448_448_0.77_7.82G | 14.48 | 69.0667 | 279.35 |
30 | Inception-v4 | cf_inceptionv4_imagenet_299_299_24.5G | 34.46 | 29.0167 | 78.5 |
31 | SqueezeNet | cf_squeeze_imagenet_227_227_0.76G | 3.64 | 274.767 | 1012.17 |
32 | face_landmark | cf_landmark_celeba_96_72_0.14G | 1.15 | 871.333 | 1444.25 |
33 | reid | cf_reid_marketcuhk_160_80_0.95G | 2.7 | 370.317 | 702.8 |
34 | yolov3_bdd | cf_yolov3_bdd_288_512_53.7G | 74.07 | 13.5 | 42.0833 |
35 | tf_mobilenet_v1 | tf_mobilenetv1_imagenet_224_224_1.14G | 3.23 | 309.65 | 809.5 |
36 | resnet18 | cf_resnet18_imagenet_224_224_3.65G | 5.18 | 193.067 | 477.05 |
37 | resnet18_wide | tf_resnet18_imagenet_224_224_28G | 33.41 | 29.9333 | 80.0667 |
Performance on FPGA board: ZCU104
Click here to view details
The following table lists the performance number including end-to-end throughput and latency for each model on the ZCU104
board with a 2 * B4096 @ 305MHz V1.4.0
DPU configuration:
No. | Model | Name | E2E latency (ms) Thread num =1 | E2E throughput -fps(Single Thread) | E2E throughput -fps(Multi Thread) |
---|---|---|---|---|---|
1 | resnet50 | cf_resnet50_imagenet_224_224_7.7G | 12.13 | 82.45 | 151.8 |
2 | Inception_v1 | cf_inceptionv1_imagenet_224_224_3.16G | 5.07 | 197.333 | 404.933 |
3 | Inception_v2 | cf_inceptionv2_imagenet_224_224_4G | 6.33 | 158.033 | 310.15 |
4 | Inception_v3 | cf_inceptionv3_imagenet_299_299_11.4G | 16.03 | 62.3667 | 126.283 |
5 | mobileNet_v2 | cf_mobilenetv2_imagenet_224_224_0.59G | 3.85 | 259.833 | 536.95 |
6 | tf_resnet50 | tf_resnet50_imagenet_224_224_6.97G | 11.31 | 88.45 | 163.65 |
7 | tf_inception_v1 | tf_inceptionv1_imagenet_224_224_3G | 6.35 | 157.367 | 305.467 |
8 | tf_mobilenet_v2 | tf_mobilenetv2_imagenet_224_224_1.17G | 5.21 | 191.867 | 380.933 |
9 | ssd_adas_pruned_0.95 | cf_ssdadas_bdd_360_480_0.95_6.3G | 10.69 | 93.5333 | 242.917 |
10 | ssd_pedestrain_pruned_0.97 | cf_ssdpedestrian_coco_360_640_0.97_5.9G | 12.13 | 82.45 | 236.083 |
11 | ssd_traffic_pruned_0.9 | cf_ssdtraffic_360_480_0.9_11.6G | 16.48 | 60.6667 | 159.617 |
12 | ssd_mobilnet_v2 | cf_ssdmobilenetv2_bdd_360_480_6.57G | 37.78 | 26.4667 | 116.433 |
13 | tf_ssd_voc | tf_ssd_voc_300_300_64.81G | 75.09 | 13.3167 | 33.5667 |
14 | densebox_320_320 | cf_densebox_wider_320_320_0.49G | 2.33 | 428.533 | 1167.35 |
15 | densebox_360_640 | cf_densebox_wider_360_640_1.11G | 4.65 | 215.017 | 626.317 |
16 | yolov3_adas_prune_0.9 | dk_yolov3_cityscapes_256_512_0.9_5.46G | 10.51 | 95.1667 | 228.383 |
17 | yolov3_voc | dk_yolov3_voc_416_416_65.42G | 66.37 | 15.0667 | 33 |
18 | tf_yolov3_voc | tf_yolov3_voc_416_416_65.63G | 66.74 | 14.9833 | 32.8 |
19 | refinedet_pruned_0.8 | cf_refinedet_coco_360_480_0.8_25G | 28 | 35.7167 | 79.1333 |
20 | refinedet_pruned_0.92 | cf_refinedet_coco_360_480_0.92_10.10G | 14.54 | 68.7833 | 160.6 |
21 | refinedet_pruned_0.96 | cf_refinedet_coco_360_480_0.96_5.08G | 10.39 | 96.2333 | 241.783 |
22 | FPN | cf_fpn_cityscapes_256_512_8.9G | 15.72 | 63.6167 | 177.333 |
23 | VPGnet_pruned_0.99 | cf_VPGnet_caltechlane_480_640_0.99_2.5G | 8.91 | 112.233 | 355.717 |
24 | SP-net | cf_SPnet_aichallenger_224_128_0.54G | 1.6 | 626.5 | 1337.33 |
25 | Openpose_pruned_0.3 | cf_openpose_aichallenger_368_368_0.3_189.7G | 267.86 | 3.73333 | 12.1333 |
26 | yolov2_voc | dk_yolov2_voc_448_448_34G | 37.66 | 26.55 | 63.7833 |
27 | yolov2_voc_pruned_0.66 | dk_yolov2_voc_448_448_0.66_11.56G | 17.51 | 57.1167 | 158.917 |
28 | yolov2_voc_pruned_0.71 | dk_yolov2_voc_448_448_0.71_9.86G | 15.63 | 63.9667 | 186.867 |
29 | yolov2_voc_pruned_0.77 | dk_yolov2_voc_448_448_0.77_7.82G | 13.78 | 72.55 | 224.883 |
30 | Inception-v4 | cf_inceptionv4_imagenet_299_299_24.5G | 32.33 | 30.9333 | 64.6 |
31 | SqueezeNet | cf_squeeze_imagenet_227_227_0.76G | 3.52 | 284.033 | 940.917 |
32 | face_landmark | cf_landmark_celeba_96_72_0.14G | 1.02 | 977.683 | 1428.2 |
33 | reid | cf_reid_marketcuhk_160_80_0.95G | 2.45 | 407.583 | 702.717 |
34 | yolov3_bdd | cf_yolov3_bdd_288_512_53.7G | 69.77 | 14.3333 | 31.7 |
35 | tf_mobilenet_v1 | tf_mobilenetv1_imagenet_224_224_1.14G | 3.03 | 330.25 | 728.35 |
36 | resnet18 | cf_resnet18_imagenet_224_224_3.65G | 4.84 | 206.65 | 428.55 |
37 | resnet18_wide | tf_resnet18_imagenet_224_224_28G | 31.23 | 32.0167 | 62.7667 |
Performance on Ultra96
Click here to view details
The following table lists the performance number including end-to-end throughput and latency for each model on the Ultra96
board with a 1 * B1600 @ 287MHz V1.4.0
DPU configuration:
Note: The original power supply of Ultra96 is not designed for high performance AI workload. The board may occasionally hang to run few models, When multi-thread is used. For such situations, NA
is specified in the following table.
No. | Model | Name | E2E latency (ms) Thread num =1 | E2E throughput -fps(Single Thread) | E2E throughput -fps(Multi Thread) |
---|---|---|---|---|---|
1 | resnet50 | cf_resnet50_imagenet_224_224_7.7G | 30.8 | 32.4667 | 33.4667 |
2 | Inception_v1 | cf_inceptionv1_imagenet_224_224_3.16G | 13.98 | 71.55 | 75.0667 |
3 | Inception_v2 | cf_inceptionv2_imagenet_224_224_4G | 17.16 | 58.2667 | 61.2833 |
4 | Inception_v3 | cf_inceptionv3_imagenet_299_299_11.4G | 44.05 | 22.7 | 23.4333 |
5 | mobileNet_v2 | cf_mobilenetv2_imagenet_224_224_0.59G | 7.34 | 136.183 | NA |
6 | tf_resnet50 | tf_resnet50_imagenet_224_224_6.97G | 28.02 | 35.6833 | 36.6 |
7 | tf_inception_v1 | tf_inceptionv1_imagenet_224_224_3G | 16.96 | 58.9667 | 61.2833 |
8 | tf_mobilenet_v2 | tf_mobilenetv2_imagenet_224_224_1.17G | 10.17 | 98.3 | 104.25 |
9 | ssd_adas_pruned_0.95 | cf_ssdadas_bdd_360_480_0.95_6.3G | 24.3 | 41.15 | 46.2 |
10 | ssd_pedestrain_pruned_0.97 | cf_ssdpedestrian_coco_360_640_0.97_5.9G | 23.29 | 42.9333 | 50.8 |
11 | ssd_traffic_pruned_0.9 | cf_ssdtraffic_360_480_0.9_11.6G | 35.5 | 28.1667 | 31.8 |
12 | ssd_mobilnet_v2 | cf_ssdmobilenetv2_bdd_360_480_6.57G | 60.79 | 16.45 | 27.8167 |
13 | tf_ssd_voc | tf_ssd_voc_300_300_64.81G | 186.92 | 5.35 | 5.81667 |
14 | densebox_320_320 | cf_densebox_wider_320_320_0.49G | 4.17 | 239.883 | 334.167 |
15 | densebox_360_640 | cf_densebox_wider_360_640_1.11G | 8.55 | 117 | 167.2 |
16 | yolov3_adas_prune_0.9 | dk_yolov3_cityscapes_256_512_0.9_5.46G | 22.79 | 43.8833 | 49.6833 |
17 | yolov3_voc | dk_yolov3_voc_416_416_65.42G | 185.19 | 5.4 | 5.53 |
18 | tf_yolov3_voc | tf_yolov3_voc_416_416_65.63G | 199.34 | 5.01667 | 5.1 |
19 | refinedet_pruned_0.8 | cf_refinedet_coco_360_480_0.8_25G | 66.37 | 15.0667 | NA |
20 | refinedet_pruned_0.92 | cf_refinedet_coco_360_480_0.92_10.10G | 32.17 | 31.0883 | 33.6667 |
21 | refinedet_pruned_0.96 | cf_refinedet_coco_360_480_0.96_5.08G | 20.29 | 49.2833 | 55.25 |
22 | FPN | cf_fpn_cityscapes_256_512_8.9G | 36.34 | 27.5167 | NA |
23 | VPGnet_pruned_0.99 | cf_VPGnet_caltechlane_480_640_0.99_2.5G | 13.9 | 71.9333 | NA |
24 | SP-net | cf_SPnet_aichallenger_224_128_0.54G | 3.82 | 261.55 | 277.4 |
25 | Openpose_pruned_0.3 | cf_openpose_aichallenger_368_368_0.3_189.7G | 560.75 | 1.78333 | NA |
26 | yolov2_voc | dk_yolov2_voc_448_448_34G | 118.11 | 8.46667 | 8.9 |
27 | yolov2_voc_pruned_0.66 | dk_yolov2_voc_448_448_0.66_11.56G | 37.5 | 26.6667 | 30.65 |
28 | yolov2_voc_pruned_0.71 | dk_yolov2_voc_448_448_0.71_9.86G | 30.99 | 32.2667 | 38.35 |
29 | yolov2_voc_pruned_0.77 | dk_yolov2_voc_448_448_0.77_7.82G | 26.29 | 38.03333 | 46.8333 |
30 | Inception-v4 | cf_inceptionv4_imagenet_299_299_24.5G | 88.76 | 11.2667 | 11.5333 |
31 | SqueezeNet | cf_squeeze_imagenet_227_227_0.76G | 5.96 | 167.867 | 283.583 |
32 | face_landmark | cf_landmark_celeba_96_72_0.14G | 2.95 | 339.183 | 347.633 |
33 | reid | cf_reid_marketcuhk_160_80_0.95G | 6.28 | 159.15 | 166.633 |
34 | yolov3_bdd | cf_yolov3_bdd_288_512_53.7G | 193.55 | 5.16667 | 5.31667 |
35 | tf_mobilenet_v1 | tf_mobilenetv1_imagenet_224_224_1.14G | 5.97 | 167.567 | 186.55 |
36 | resnet18 | cf_resnet18_imagenet_224_224_3.65G | 13.47 | 74.2167 | 77.8167 |
37 | resnet18_wide | tf_resnet18_imagenet_224_224_28G | 97.72 | 10.2333 | 10.3833 |
Contributing
We welcome community contributions. When contributing to this repository, first discuss the change you wish to make via:
You can also submit a pull request with details on how to improve the product. Prior to submitting your pull request, ensure that you can build the product and run all the demos with your patch. In case of a larger feature, provide a relevant demo.
License
Xilinx AI Model Zoo is licensed under Apache License Version 2.0. By contributing to the project, you agree to the license and copyright terms therein and release your contribution under these terms.
Copyright© 2019 Xilinx