problem with graph_vgg16.cpp
RavikumarLav opened this issue · 2 comments
I tried to test this graph on armv8 neon with library V24.02.1 with below build option
scons Werror=1 -j8 debug=1 neon=1 opencl=0 os=linux arch=armv8a
Weights and bias .npy is taken from the graph from the link mentioned in the .cpp file.
Provenance: www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel
case1:
Getting segmentation fault after run the graph
case2: If image option not provided getting other output with segmentation fault
Attached image and label file
Label_clsid.txt
Please let me know why segmentation fault is coming.
Also i wanted to know the runtime for these examples graphs on neon core.
Could you please share the full error log in a text format ?
What device are you using to run the example?
The graph examples only support images in the ppm format. You cannot pass a jpg image directly, you have to decompress the image and convert it to ppm.
For more detailed information about the graph examples please see this article
If you build the library with benchmark_examples=1 then you can use the instruments to look into the graph example performance
cl/main# LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./benchmark_graph_mobilenet_v2 --instruments=SCHEDULER_TIMER_MS --example_args='--target=NEON,--fast-math'
Version = arm_compute_version=v0.0-unreleased Build options: {'standalone': '1', 'test_filter': 'ActivationLayer.cpp', 'opencl': '0', 'neon': '1', 'validation_tests': '1', 'examples': '0', 'debug': '1', 'arch': 'armv8a', 'benchmark_examples': '1'} Git hash=e112ef1cc70bcdc52ded44350e61eb16d74559b3
CommandLine = ./benchmark_graph_mobilenet_v2 --instruments=SCHEDULER_TIMER_MS --example_args=--target=NEON,--fast-math
Iterations = 1
Running [0] 'Examples/benchmark_graph_mobilenet_v2'
Threads : 1
Target : Neon
Data type : F32
Data layout : NHWC
Tuner enabled? : false
Cache enabled? : false
Tuner mode : Normal
Tuner file :
MLGO file :
Fast math enabled? : true
SchedulerTimer/Conv+Conv/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #2: AVG=2.3420 ms
SchedulerTimer/Conv+Conv/BatchNorm/CpuIm2ColKernel #1: AVG=12.4220 ms
SchedulerTimer/Conv+Conv/BatchNorm/CpuWeightsReshapeKernel #0: AVG=0.1020 ms
SchedulerTimer/Conv_1+Conv_1/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #97: AVG=3.6140 ms
SchedulerTimer/Conv_1+Conv_1/BatchNorm/CpuWeightsReshapeKernel #96: AVG=15.0640 ms
SchedulerTimer/Logits/AvgPool/CpuPool2dAssemblyWrapperKernel #98: AVG=0.1920 ms
SchedulerTimer/Logits/Conv2d_1c_1x1/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #99: AVG=0.7020 ms
SchedulerTimer/Predictions/Reshape/CpuReshapeKernel #100: AVG=1.1760 ms
SchedulerTimer/Predictions/Softmax/CpuLogits1DMaxKernel/neon_fp32_logits_1d_max #101: AVG=0.0270 ms
SchedulerTimer/Predictions/Softmax/CpuLogits1DSoftmaxKernel/neon_fp32_softmax_logits_1d #102: AVG=0.1710 ms
SchedulerTimer/expanded_conv/depthwise/depthwise+expanded_conv/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output4x4_mla_depthfirst #3: AVG=1.8800 ms
SchedulerTimer/expanded_conv/project/Conv2D+expanded_conv/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #5: AVG=1.1850 ms
SchedulerTimer/expanded_conv/project/Conv2D+expanded_conv/project/BatchNorm/CpuWeightsReshapeKernel #4: AVG=0.0640 ms
SchedulerTimer/expanded_conv_1/depthwise/depthwise+expanded_conv_1/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s2_output2x2_mla_depthfirst #8: AVG=2.4930 ms
SchedulerTimer/expanded_conv_1/expand/Conv2D+expanded_conv_1/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_smallK_hybrid_fp32_mla_6x4 #7: AVG=5.1230 ms
SchedulerTimer/expanded_conv_1/expand/Conv2D+expanded_conv_1/expand/BatchNorm/CpuWeightsReshapeKernel #6: AVG=0.2100 ms
SchedulerTimer/expanded_conv_1/project/Conv2D+expanded_conv_1/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #10: AVG=1.3540 ms
SchedulerTimer/expanded_conv_1/project/Conv2D+expanded_conv_1/project/BatchNorm/CpuWeightsReshapeKernel #9: AVG=0.1300 ms
SchedulerTimer/expanded_conv_10/depthwise/depthwise+expanded_conv_10/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output2x2_mla_depthfirst #59: AVG=0.3390 ms
SchedulerTimer/expanded_conv_10/expand/Conv2D+expanded_conv_10/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #58: AVG=0.9590 ms
SchedulerTimer/expanded_conv_10/expand/Conv2D+expanded_conv_10/expand/BatchNorm/CpuWeightsReshapeKernel #57: AVG=1.3400 ms
SchedulerTimer/expanded_conv_10/project/Conv2D+expanded_conv_10/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #61: AVG=1.3400 ms
SchedulerTimer/expanded_conv_10/project/Conv2D+expanded_conv_10/project/BatchNorm/CpuWeightsReshapeKernel #60: AVG=1.3500 ms
SchedulerTimer/expanded_conv_11/add/CpuAddKernel/neon_fp32_add #67: AVG=0.2580 ms
SchedulerTimer/expanded_conv_11/depthwise/depthwise+expanded_conv_11/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output2x2_mla_depthfirst #64: AVG=0.6260 ms
SchedulerTimer/expanded_conv_11/expand/Conv2D+expanded_conv_11/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #63: AVG=2.0200 ms
SchedulerTimer/expanded_conv_11/expand/Conv2D+expanded_conv_11/expand/BatchNorm/CpuWeightsReshapeKernel #62: AVG=2.6450 ms
SchedulerTimer/expanded_conv_11/project/Conv2D+expanded_conv_11/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #66: AVG=1.8870 ms
SchedulerTimer/expanded_conv_11/project/Conv2D+expanded_conv_11/project/BatchNorm/CpuWeightsReshapeKernel #65: AVG=1.9040 ms
SchedulerTimer/expanded_conv_12/add/CpuAddKernel/neon_fp32_add #73: AVG=0.2820 ms
SchedulerTimer/expanded_conv_12/depthwise/depthwise+expanded_conv_12/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output2x2_mla_depthfirst #70: AVG=0.5730 ms
SchedulerTimer/expanded_conv_12/expand/Conv2D+expanded_conv_12/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #69: AVG=2.0760 ms
SchedulerTimer/expanded_conv_12/expand/Conv2D+expanded_conv_12/expand/BatchNorm/CpuWeightsReshapeKernel #68: AVG=2.6250 ms
SchedulerTimer/expanded_conv_12/project/Conv2D+expanded_conv_12/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #72: AVG=1.8460 ms
SchedulerTimer/expanded_conv_12/project/Conv2D+expanded_conv_12/project/BatchNorm/CpuWeightsReshapeKernel #71: AVG=1.9330 ms
SchedulerTimer/expanded_conv_13/depthwise/depthwise+expanded_conv_13/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s2_output2x2_mla_depthfirst #76: AVG=0.2680 ms
SchedulerTimer/expanded_conv_13/expand/Conv2D+expanded_conv_13/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #75: AVG=2.0940 ms
SchedulerTimer/expanded_conv_13/expand/Conv2D+expanded_conv_13/expand/BatchNorm/CpuWeightsReshapeKernel #74: AVG=2.6290 ms
SchedulerTimer/expanded_conv_13/project/Conv2D+expanded_conv_13/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #78: AVG=0.8540 ms
SchedulerTimer/expanded_conv_13/project/Conv2D+expanded_conv_13/project/BatchNorm/CpuWeightsReshapeKernel #77: AVG=3.1880 ms
SchedulerTimer/expanded_conv_14/add/CpuAddKernel/neon_fp32_add #84: AVG=0.1270 ms
SchedulerTimer/expanded_conv_14/depthwise/depthwise+expanded_conv_14/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output4x4_mla_depthfirst #81: AVG=0.2770 ms
SchedulerTimer/expanded_conv_14/expand/Conv2D+expanded_conv_14/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #80: AVG=1.4400 ms
SchedulerTimer/expanded_conv_14/expand/Conv2D+expanded_conv_14/expand/BatchNorm/CpuWeightsReshapeKernel #79: AVG=6.3610 ms
SchedulerTimer/expanded_conv_14/project/Conv2D+expanded_conv_14/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #83: AVG=1.4390 ms
SchedulerTimer/expanded_conv_14/project/Conv2D+expanded_conv_14/project/BatchNorm/CpuWeightsReshapeKernel #82: AVG=5.1470 ms
SchedulerTimer/expanded_conv_15/add/CpuAddKernel/neon_fp32_add #90: AVG=0.1260 ms
SchedulerTimer/expanded_conv_15/depthwise/depthwise+expanded_conv_15/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output4x4_mla_depthfirst #87: AVG=0.2780 ms
SchedulerTimer/expanded_conv_15/expand/Conv2D+expanded_conv_15/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #86: AVG=1.4400 ms
SchedulerTimer/expanded_conv_15/expand/Conv2D+expanded_conv_15/expand/BatchNorm/CpuWeightsReshapeKernel #85: AVG=6.3720 ms
SchedulerTimer/expanded_conv_15/project/Conv2D+expanded_conv_15/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #89: AVG=1.4230 ms
SchedulerTimer/expanded_conv_15/project/Conv2D+expanded_conv_15/project/BatchNorm/CpuWeightsReshapeKernel #88: AVG=5.1430 ms
SchedulerTimer/expanded_conv_16/depthwise/depthwise+expanded_conv_16/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output4x4_mla_depthfirst #93: AVG=0.2750 ms
SchedulerTimer/expanded_conv_16/expand/Conv2D+expanded_conv_16/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #92: AVG=1.4690 ms
SchedulerTimer/expanded_conv_16/expand/Conv2D+expanded_conv_16/expand/BatchNorm/CpuWeightsReshapeKernel #91: AVG=6.3670 ms
SchedulerTimer/expanded_conv_16/project/Conv2D+expanded_conv_16/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #95: AVG=2.7300 ms
SchedulerTimer/expanded_conv_16/project/Conv2D+expanded_conv_16/project/BatchNorm/CpuWeightsReshapeKernel #94: AVG=10.2600 ms
SchedulerTimer/expanded_conv_2/add/CpuAddKernel/neon_fp32_add #16: AVG=0.9310 ms
SchedulerTimer/expanded_conv_2/depthwise/depthwise+expanded_conv_2/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output4x4_mla_depthfirst #13: AVG=2.4990 ms
SchedulerTimer/expanded_conv_2/expand/Conv2D+expanded_conv_2/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #12: AVG=2.4790 ms
SchedulerTimer/expanded_conv_2/expand/Conv2D+expanded_conv_2/expand/BatchNorm/CpuWeightsReshapeKernel #11: AVG=0.3390 ms
SchedulerTimer/expanded_conv_2/project/Conv2D+expanded_conv_2/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #15: AVG=2.1000 ms
SchedulerTimer/expanded_conv_2/project/Conv2D+expanded_conv_2/project/BatchNorm/CpuWeightsReshapeKernel #14: AVG=0.1660 ms
SchedulerTimer/expanded_conv_3/depthwise/depthwise+expanded_conv_3/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s2_output2x2_mla_depthfirst #19: AVG=1.0300 ms
SchedulerTimer/expanded_conv_3/expand/Conv2D+expanded_conv_3/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #18: AVG=2.4800 ms
SchedulerTimer/expanded_conv_3/expand/Conv2D+expanded_conv_3/expand/BatchNorm/CpuWeightsReshapeKernel #17: AVG=0.3360 ms
SchedulerTimer/expanded_conv_3/project/Conv2D+expanded_conv_3/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #21: AVG=0.6930 ms
SchedulerTimer/expanded_conv_3/project/Conv2D+expanded_conv_3/project/BatchNorm/CpuWeightsReshapeKernel #20: AVG=0.2140 ms
SchedulerTimer/expanded_conv_4/add/CpuAddKernel/neon_fp32_add #27: AVG=0.3260 ms
SchedulerTimer/expanded_conv_4/depthwise/depthwise+expanded_conv_4/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output4x4_mla_depthfirst #24: AVG=0.7980 ms
SchedulerTimer/expanded_conv_4/expand/Conv2D+expanded_conv_4/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #23: AVG=1.0150 ms
SchedulerTimer/expanded_conv_4/expand/Conv2D+expanded_conv_4/expand/BatchNorm/CpuWeightsReshapeKernel #22: AVG=0.4820 ms
SchedulerTimer/expanded_conv_4/project/Conv2D+expanded_conv_4/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #26: AVG=0.8960 ms
SchedulerTimer/expanded_conv_4/project/Conv2D+expanded_conv_4/project/BatchNorm/CpuWeightsReshapeKernel #25: AVG=0.2610 ms
SchedulerTimer/expanded_conv_5/add/CpuAddKernel/neon_fp32_add #33: AVG=0.3260 ms
SchedulerTimer/expanded_conv_5/depthwise/depthwise+expanded_conv_5/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output4x4_mla_depthfirst #30: AVG=0.7950 ms
SchedulerTimer/expanded_conv_5/expand/Conv2D+expanded_conv_5/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #29: AVG=1.0570 ms
SchedulerTimer/expanded_conv_5/expand/Conv2D+expanded_conv_5/expand/BatchNorm/CpuWeightsReshapeKernel #28: AVG=0.5120 ms
SchedulerTimer/expanded_conv_5/project/Conv2D+expanded_conv_5/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #32: AVG=0.9380 ms
SchedulerTimer/expanded_conv_5/project/Conv2D+expanded_conv_5/project/BatchNorm/CpuWeightsReshapeKernel #31: AVG=0.2580 ms
SchedulerTimer/expanded_conv_6/depthwise/depthwise+expanded_conv_6/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s2_output2x2_mla_depthfirst #36: AVG=0.2630 ms
SchedulerTimer/expanded_conv_6/expand/Conv2D+expanded_conv_6/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #35: AVG=1.0330 ms
SchedulerTimer/expanded_conv_6/expand/Conv2D+expanded_conv_6/expand/BatchNorm/CpuWeightsReshapeKernel #34: AVG=0.4840 ms
SchedulerTimer/expanded_conv_6/project/Conv2D+expanded_conv_6/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #38: AVG=0.4170 ms
SchedulerTimer/expanded_conv_6/project/Conv2D+expanded_conv_6/project/BatchNorm/CpuWeightsReshapeKernel #37: AVG=0.4970 ms
SchedulerTimer/expanded_conv_7/add/CpuAddKernel/neon_fp32_add #44: AVG=0.1770 ms
SchedulerTimer/expanded_conv_7/depthwise/depthwise+expanded_conv_7/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output2x2_mla_depthfirst #41: AVG=0.3420 ms
SchedulerTimer/expanded_conv_7/expand/Conv2D+expanded_conv_7/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #40: AVG=0.9150 ms
SchedulerTimer/expanded_conv_7/expand/Conv2D+expanded_conv_7/expand/BatchNorm/CpuWeightsReshapeKernel #39: AVG=1.3650 ms
SchedulerTimer/expanded_conv_7/project/Conv2D+expanded_conv_7/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #43: AVG=0.9360 ms
SchedulerTimer/expanded_conv_7/project/Conv2D+expanded_conv_7/project/BatchNorm/CpuWeightsReshapeKernel #42: AVG=0.9110 ms
SchedulerTimer/expanded_conv_8/add/CpuAddKernel/neon_fp32_add #50: AVG=0.1770 ms
SchedulerTimer/expanded_conv_8/depthwise/depthwise+expanded_conv_8/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output2x2_mla_depthfirst #47: AVG=0.3850 ms
SchedulerTimer/expanded_conv_8/expand/Conv2D+expanded_conv_8/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #46: AVG=0.9580 ms
SchedulerTimer/expanded_conv_8/expand/Conv2D+expanded_conv_8/expand/BatchNorm/CpuWeightsReshapeKernel #45: AVG=1.3400 ms
SchedulerTimer/expanded_conv_8/project/Conv2D+expanded_conv_8/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #49: AVG=0.9050 ms
SchedulerTimer/expanded_conv_8/project/Conv2D+expanded_conv_8/project/BatchNorm/CpuWeightsReshapeKernel #48: AVG=0.8890 ms
SchedulerTimer/expanded_conv_9/add/CpuAddKernel/neon_fp32_add #56: AVG=0.2040 ms
SchedulerTimer/expanded_conv_9/depthwise/depthwise+expanded_conv_9/depthwise/BatchNorm/CpuDepthwiseConv2dAssemblyWrapperKernel/a64_fp32_nhwc_3x3_s1_output2x2_mla_depthfirst #53: AVG=0.3510 ms
SchedulerTimer/expanded_conv_9/expand/Conv2D+expanded_conv_9/expand/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_4x24 #52: AVG=0.9690 ms
SchedulerTimer/expanded_conv_9/expand/Conv2D+expanded_conv_9/expand/BatchNorm/CpuWeightsReshapeKernel #51: AVG=1.3690 ms
SchedulerTimer/expanded_conv_9/project/Conv2D+expanded_conv_9/project/BatchNorm/CpuGemmAssemblyWrapperKernel/a64_hybrid_fp32_mla_6x16 #55: AVG=0.9100 ms
SchedulerTimer/expanded_conv_9/project/Conv2D+expanded_conv_9/project/BatchNorm/CpuWeightsReshapeKernel #54: AVG=0.8850 ms
Executed 1 test(s) (1 passed, 0 expected failures, 0 failed, 0 crashed, 0 disabled) in 0 second(s)
You'll get the best performance if you use the data_layout NHWC
Another alternative is to use ArmNN ExecuteNetwork
to run tflite models, for more info about this please see #1077 (comment)
Hope this helps.