Unable to do inference with Caffee Custom model
madhu-korada opened this issue · 4 comments
Describe the problem
I was testing out MACE with a custom model from caffee model zoo. I was able to convert the model from caffee to MACE but during inference, it is causing the following issue.
Getting the following error during the inference of the model.
A/MACE: runtime.cc:179 Check failed: buffer->memory<void>() != nullptr
D/OpenGLRenderer: HWUI GL Pipeline
A/MACE: runtime.cc:179 backtrace:
runtime.cc:179 pc 0x7e59baf028 _ZN4mace4port10AndroidEnv18GetBackTraceUnsafeEi
runtime.cc:179 pc 0x7e59bb0d0c _ZN4mace4port6Logger13DealWithFatalEv
runtime.cc:179 pc 0x7e59bb0cc0 _ZN4mace4port6Logger18GenerateLogMessageEv
runtime.cc:179 pc 0x7e59bb0e48 _ZN4mace4port6LoggerD2Ev
runtime.cc:179 pc 0x7e59bb0ea4 _ZN4mace4port6LoggerD1Ev
runtime.cc:179 pc 0x7e59b8bbcc
runtime.cc:179 pc 0x7e59b73afc
runtime.cc:179 pc 0x7e59b738d4 _ZN4mace8BaseFlow4InitEPKNS_6NetDefEPKhlPb
runtime.cc:179 pc 0x7e59a712f8 _ZN4mace10CpuRefFlow4InitEPKNS_6NetDefEPKhlPb
runtime.cc:179 pc 0x7e59a65938 _ZN4mace12SerialEngine18CreateAndInitFlowsERKNSt6__ndk13mapIiPKNS_6NetDefENS1_4lessIiEENS1_9allocatorINS1_4pairIKiS5_EEEEEERKNS1_13unordered_mapIS5_NS1_10shared_ptrINS_7RuntimeEEENS1_4hashIS5_EENS1_8equal_toIS5_EENS8_INS9_IKS5_SJ_EEEEEEPKhlPb
runtime.cc:179 pc 0x7e59a64628 _ZN4mace12SerialEngine6DoInitEPKNS_11MultiNetDefERKNSt6__ndk16vectorINS4_12basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEENS9_ISB_EEEESF_PKhlPbPNS_10BaseEngineE
runtime.cc:179 pc 0x7e59a64858 _ZN4mace12SerialEngine4InitEPKNS_11MultiNetDefERKNSt6__ndk16vectorINS4_12basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEENS9_ISB_EEEESF_PKhlPbPNS_10BaseEngineE
A/MACE: runtime.cc:179 pc 0x7e59a6cf30 _ZN4mace10MaceEngine4Impl4InitEPKNS_11MultiNetDefERKNSt6__ndk16vectorINS5_12basic_stringIcNS5_11char_traitsIcEENS5_9allocatorIcEEEENSA_ISC_EEEESG_PKhlPbPS1_b
runtime.cc:179 pc 0x7e59a6d660 _ZN4mace10MaceEngine4InitEPKNS_11MultiNetDefERKNSt6__ndk16vectorINS4_12basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEENS9_ISB_EEEESF_PKhlPbPS0_b
runtime.cc:179 pc 0x7e597124d4 _ZN4mace24CreateMaceEngineFromCodeERKNSt6__ndk112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEEPKhmRKNS0_6vectorIS6_NS4_IS6_EEEESF_RKNS_16MaceEngineConfigEPNS0_10shared_ptrINS_10MaceEngineEEEPbPSK_b
runtime.cc:179 pc 0x7e59713010 Java_com_xiaomi_mace_JniMaceUtils_maceMobilenetCreateEngine
runtime.cc:179 pc 0x7e5a10c23c oatexec
A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 25029 (jniThread), pid 24994 (iaomi.mace.demo)
And after conversion of the model for some reason the first dimension of the layer output shape is -1
, for the other models it was 1
. Is this causing the issue ??
Final ops:
conv1 (Conv2D, index:0): [[-1, 32, 32, 32]]
pool1 (Pooling, index:1): [[-1, 32, 16, 16]]
relu1 (Activation, index:2): [[-1, 32, 16, 16]]
relu2 (Conv2D, index:3): [[-1, 32, 16, 16]]
pool2 (Pooling, index:4): [[-1, 32, 8, 8]]
relu3 (Conv2D, index:5): [[-1, 64, 8, 8]]
pool3 (Pooling, index:6): [[-1, 64, 4, 4]]
ip1 (FullyConnected, index:7): [[-1, 64, 1, 1]]
ip2 (FullyConnected, index:8): [[-1, 10, 1, 1]]
prob (Softmax, index:9): [[-1, 10, 1, 1]]
System information
- MACE version:- v1.0.2-148-gc75cb35
Model deploy file (*.yml)
library_name: custom
target_abis: [arm64-v8a]
model_graph_format: code
model_data_format: code
models:
caffee_mnist:
platform: caffe
model_file_path: /home/model_zoo/lenet.prototxt
weight_file_path: /home/model_zoo/lenet_iter_200.caffemodel
model_sha256_checksum: 357b3adb1f75ea6ba392c2fa2f76b3091e0e0ba7947985373341add6da3462f0
weight_sha256_checksum: 6e046addbaa74b75c0972d100987a790d65882c6a4c76fac4e00cafe2e5f9e85
subgraphs:
- input_tensors:
- data
input_shapes:
-1,28,28,1
output_tensors:
- prob
output_shapes:
- 1,10
runtime: cpu+gpu
limit_opencl_kernel_time: 0
nnlib_graph_mode: 0
obfuscate: 0
winograd: 0
caffee_cifar10:
platform: caffe
model_file_path: /home/model_zoo/cifar10_quick.prototxt
weight_file_path: /home/model_zoo/cifar10_quick_iter_400.caffemodel
model_sha256_checksum: 75f640b26a7d9d119dd68a215f1d598bfb19fc986d1e6a2c8ea97a6eb4d09eda
weight_sha256_checksum: 18533a177ef55f495f1fd5058bfaf3cd19cead98c29886ad8cb55553f1a19c4b
subgraphs:
- input_tensors:
- data
input_shapes:
-1,32,32,3
output_tensors:
- prob
output_shapes:
- 1,10
runtime: cpu+gpu
limit_opencl_kernel_time: 0
nnlib_graph_mode: 0
obfuscate: 0
winograd: 0
Model prototxt file (caffee)
name: "CIFAR10_quick_test"
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 1 dim: 3 dim: 32 dim: 32 } }
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
stride: 1
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "pool1"
top: "pool1"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
stride: 1
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: AVE
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 64
pad: 2
kernel_size: 5
stride: 1
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv3"
top: "pool3"
pooling_param {
pool: AVE
kernel_size: 3
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool3"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 64
}
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "ip2"
top: "prob"
}
Additional context
I am using the example android app to do the inference.
And after conversion of the model for some reason the first dimension of the layer output shape is -1, for the other models it was 1. Is this causing the issue ??
Yes, please check your model with netron, why the batch size is -1?
input_shapes:
-1,28,28,1
add a space between - and number. - 1,28,28,1
In netron the batch size is 1, but after conversion it is showing -1 in MACE.
I am currently not using caffee conversion, tf.keras conversion seems to work fine for me. If I face such issue again, I will reopen the issue.
input_shapes:
-1,28,28,1
add a space between - and number. - 1,28,28,1
Thanks, this might work. Didn't notice that.