Unable to do inference with Caffee Custom model

Question

Unable to do inference with Caffee Custom model

madhu-korada opened this issue 3 years ago · 4 comments

Describe the problem

I was testing out MACE with a custom model from caffee model zoo. I was able to convert the model from caffee to MACE but during inference, it is causing the following issue.

Getting the following error during the inference of the model.

A/MACE: runtime.cc:179 Check failed: buffer->memory<void>() != nullptr 
D/OpenGLRenderer: HWUI GL Pipeline
A/MACE: runtime.cc:179 backtrace:
    runtime.cc:179  pc 0x7e59baf028 _ZN4mace4port10AndroidEnv18GetBackTraceUnsafeEi
    runtime.cc:179  pc 0x7e59bb0d0c _ZN4mace4port6Logger13DealWithFatalEv
    runtime.cc:179  pc 0x7e59bb0cc0 _ZN4mace4port6Logger18GenerateLogMessageEv
    runtime.cc:179  pc 0x7e59bb0e48 _ZN4mace4port6LoggerD2Ev
    runtime.cc:179  pc 0x7e59bb0ea4 _ZN4mace4port6LoggerD1Ev
    runtime.cc:179  pc 0x7e59b8bbcc 
    runtime.cc:179  pc 0x7e59b73afc 
    runtime.cc:179  pc 0x7e59b738d4 _ZN4mace8BaseFlow4InitEPKNS_6NetDefEPKhlPb
    runtime.cc:179  pc 0x7e59a712f8 _ZN4mace10CpuRefFlow4InitEPKNS_6NetDefEPKhlPb
    runtime.cc:179  pc 0x7e59a65938 _ZN4mace12SerialEngine18CreateAndInitFlowsERKNSt6__ndk13mapIiPKNS_6NetDefENS1_4lessIiEENS1_9allocatorINS1_4pairIKiS5_EEEEEERKNS1_13unordered_mapIS5_NS1_10shared_ptrINS_7RuntimeEEENS1_4hashIS5_EENS1_8equal_toIS5_EENS8_INS9_IKS5_SJ_EEEEEEPKhlPb
    runtime.cc:179  pc 0x7e59a64628 _ZN4mace12SerialEngine6DoInitEPKNS_11MultiNetDefERKNSt6__ndk16vectorINS4_12basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEENS9_ISB_EEEESF_PKhlPbPNS_10BaseEngineE
    runtime.cc:179  pc 0x7e59a64858 _ZN4mace12SerialEngine4InitEPKNS_11MultiNetDefERKNSt6__ndk16vectorINS4_12basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEENS9_ISB_EEEESF_PKhlPbPNS_10BaseEngineE
A/MACE: runtime.cc:179  pc 0x7e59a6cf30 _ZN4mace10MaceEngine4Impl4InitEPKNS_11MultiNetDefERKNSt6__ndk16vectorINS5_12basic_stringIcNS5_11char_traitsIcEENS5_9allocatorIcEEEENSA_ISC_EEEESG_PKhlPbPS1_b
    runtime.cc:179  pc 0x7e59a6d660 _ZN4mace10MaceEngine4InitEPKNS_11MultiNetDefERKNSt6__ndk16vectorINS4_12basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEENS9_ISB_EEEESF_PKhlPbPS0_b
    runtime.cc:179  pc 0x7e597124d4 _ZN4mace24CreateMaceEngineFromCodeERKNSt6__ndk112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEEPKhmRKNS0_6vectorIS6_NS4_IS6_EEEESF_RKNS_16MaceEngineConfigEPNS0_10shared_ptrINS_10MaceEngineEEEPbPSK_b
    runtime.cc:179  pc 0x7e59713010 Java_com_xiaomi_mace_JniMaceUtils_maceMobilenetCreateEngine
    runtime.cc:179  pc 0x7e5a10c23c oatexec
A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 25029 (jniThread), pid 24994 (iaomi.mace.demo)

And after conversion of the model for some reason the first dimension of the layer output shape is -1, for the other models it was 1. Is this causing the issue ??

Final ops:
conv1 (Conv2D, index:0): [[-1, 32, 32, 32]]
pool1 (Pooling, index:1): [[-1, 32, 16, 16]]
relu1 (Activation, index:2): [[-1, 32, 16, 16]]
relu2 (Conv2D, index:3): [[-1, 32, 16, 16]]
pool2 (Pooling, index:4): [[-1, 32, 8, 8]]
relu3 (Conv2D, index:5): [[-1, 64, 8, 8]]
pool3 (Pooling, index:6): [[-1, 64, 4, 4]]
ip1 (FullyConnected, index:7): [[-1, 64, 1, 1]]
ip2 (FullyConnected, index:8): [[-1, 10, 1, 1]]
prob (Softmax, index:9): [[-1, 10, 1, 1]]

System information

MACE version:- v1.0.2-148-gc75cb35

Model deploy file (*.yml)

library_name: custom
target_abis: [arm64-v8a]
model_graph_format: code
model_data_format: code
models:
  caffee_mnist:
    platform: caffe
    model_file_path: /home/model_zoo/lenet.prototxt
    weight_file_path: /home/model_zoo/lenet_iter_200.caffemodel
    model_sha256_checksum: 357b3adb1f75ea6ba392c2fa2f76b3091e0e0ba7947985373341add6da3462f0
    weight_sha256_checksum: 6e046addbaa74b75c0972d100987a790d65882c6a4c76fac4e00cafe2e5f9e85
    subgraphs:
      - input_tensors:
          - data
        input_shapes:
          -1,28,28,1
        output_tensors:
          - prob
        output_shapes:
          - 1,10
    runtime: cpu+gpu
    limit_opencl_kernel_time: 0
    nnlib_graph_mode: 0
    obfuscate: 0
    winograd: 0
  caffee_cifar10:
    platform: caffe
    model_file_path: /home/model_zoo/cifar10_quick.prototxt
    weight_file_path: /home/model_zoo/cifar10_quick_iter_400.caffemodel
    model_sha256_checksum: 75f640b26a7d9d119dd68a215f1d598bfb19fc986d1e6a2c8ea97a6eb4d09eda
    weight_sha256_checksum: 18533a177ef55f495f1fd5058bfaf3cd19cead98c29886ad8cb55553f1a19c4b
    subgraphs:
      - input_tensors:
          - data
        input_shapes:
          -1,32,32,3
        output_tensors:
          - prob
        output_shapes:
          - 1,10
    runtime: cpu+gpu
    limit_opencl_kernel_time: 0
    nnlib_graph_mode: 0
    obfuscate: 0
    winograd: 0

Model prototxt file (caffee)

name: "CIFAR10_quick_test"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 1 dim: 3 dim: 32 dim: 32 } }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "pool1"
  top: "pool1"
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool3"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 64
  }
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "ip2"
  top: "prob"
}

Additional context

I am using the example android app to do the inference.

Answer 1 · 2021-08-31T00:55:26.000Z

And after conversion of the model for some reason the first dimension of the layer output shape is -1, for the other models it was 1. Is this causing the issue ??

Yes, please check your model with netron, why the batch size is -1?

Answer 2 · 2021-08-31T04:37:46.000Z

input_shapes:
-1,28,28,1
add a space between - and number. - 1,28,28,1

Answer 3 · 2021-08-31T04:56:57.000Z

In netron the batch size is 1, but after conversion it is showing -1 in MACE.

I am currently not using caffee conversion, tf.keras conversion seems to work fine for me. If I face such issue again, I will reopen the issue.

Answer 4 · 2021-08-31T05:01:18.000Z

input_shapes:
-1,28,28,1
add a space between - and number. - 1,28,28,1

Thanks, this might work. Didn't notice that.