Incorrect processing of small image by Memref container

Question

Incorrect processing of small image by Memref container

meshtag opened this issue 3 years ago · 5 comments

I am getting this error :

free(): corrupted unsorted chunks
Aborted (core dumped)

whenever I am trying to process a small 6x6 image using the modified Memref container
Minimum example to reproduce above mentioned error is :

#include <opencv2/imgcodecs.hpp>
#include <opencv2/opencv.hpp>

#include <iostream>

#include "Utils/Container.h"
#include "kernels.h"

using namespace cv;
using namespace std;

// Declare the Corr2D C interface.
extern "C" {
void _mlir_ciface_corr_2d(MemRef<float, 2> *input, MemRef<float, 2> *kernel,
                          MemRef<float, 2> *output, unsigned int centerX,
                          unsigned int centerY, int boundaryOption);
}

// Read input image
Mat inputImage = imread("../../test_6x6.png",
                             IMREAD_GRAYSCALE);

int main()
{
  float* kernelArray = prewittKernelAlign;
  int kernelRows = prewittKernelRows, kernelCols = prewittKernelCols;

  // Define allocated, sizes, and strides.
  intptr_t sizesInput[2] = {inputImage.rows, inputImage.cols};
  intptr_t sizesKernel[2] = {kernelRows, kernelCols};
  intptr_t sizesOutput[2] = {inputImage.rows, inputImage.cols};

  // Define input, kernel, and output.
  MemRef<float, 2> input(inputImage, sizesInput);
  MemRef<float, 2> kernel(kernelArray, sizesKernel);
  MemRef<float, 2> output(sizesOutput);

  for (int i = 0; i < inputImage.rows; i++)
    for (int j = 0; j < inputImage.cols; j++)
      output[i * inputImage.rows + j] = 0;

  _mlir_ciface_corr_2d(&input, &kernel, &output, 1, 2, 0);
}

The image I was trying to experiment with is here.

Note : I am able to properly process small images as well using the previous Memref container as specified in some files of buddy-mlir.

System specifications :
OS : Ubuntu 20.04 LTS
gcc version : 10.3.0
llvm submodule branch : eec312ee7f97

Answer 1 · 2022-01-04T15:27:42.000Z

I used the 6x6 test image to test Con2DBenchmark and Corr2DBenchmark, and I found that only Corr2DBenchmark has the issue. It will cause segmentation fault when running with the google benchmark and cause malloc(): invalid size (unsorted) when running with OpenCV imwrite.

Note : I am able to properly process small images as well using the previous Memref container as specified in some files of buddy-mlir.

I also encountered the malloc(): invalid size (unsorted) when I using the Memref container in buddy-mlir.

I haven't located the problem. I will have a deeper look at the MemRef container. And apart from that, is it possible that there is a problem with the pass itself?

Answer 2 · 2022-01-05T20:37:06.000Z

I also encountered the malloc(): invalid size (unsorted) when I using the Memref container in buddy-mlir.

What was the value of stride when you tried it?
FYI, I was able to process the 6x6 image correctly and used it extensively for debugging with stride = 6 when I was developing the lowering pass.

Answer 3 · 2022-01-06T00:13:57.000Z

What was the value of stride when you tried it?

I used the default 256 as the stride.

Answer 4 · 2022-01-26T14:27:31.000Z

@zhanghb97 I think I figured out the problem. We need to restrict the upper limit of stride as total no. of columns in input image. If stride is greater than that number, it causes some problems in the algorithm itself.

For example,
It is evident from this line that the algorithm will never detect correct threshold for right extrapolation if stride size exceeds image witdth(in case of stride = 256 and image dimension = 6x6).
Perhaps we should document this more clearly in buddy-mlir 's readme file so that users can get an idea about permissible values of stride.
What is your opinion on this?

Answer 5 · 2022-01-26T17:41:21.000Z

Good catch!

Perhaps we should document this more clearly in buddy-mlir 's readme file so that users can get an idea about permissible values of stride.

This can be used as a temporary solution. And please add this to the readme file.

I think we need to solve the problem in the algorithm itself. The purpose of the stride configuration is to use the register file efficiently. And our ultimate goal is to set the stride automatically in the pass, which means the stride is a compile-time value. So we cannot expect it to solve the run-time small image size problem, and we should improve the algorithm to unroll and fill the register file (at least to avoid the crash). I think the CBSM algorithm pass for the conv_2d operation will not crash, but it still needs to consider the efficiency of processing small input.