intel/ros2_openvino_toolkit

Incorrect recognition with Text recognition model

ximenesfel opened this issue · 0 comments

Hi,

I`m implementing a text recognition pipeline following this demo (https://github.com/opencv/open_model_zoo/tree/master/demos/text_detection_demo).

I'm using the following pipeline:
TextDetection
TextRecognition

My first fetchResults() method implemented was:

bool dynamic_vino_lib::TextRecognition::fetchResults() {

  bool can_fetch = dynamic_vino_lib::BaseInference::fetchResults();
  if (!can_fetch) {return false;}
  bool found_result = false;

  InferenceEngine::InferRequest::Ptr request = getEngine()->getRequest();
  std::string output = valid_model_->getOutputName();
  InferenceEngine::BlobMap blobs;
  blobs[output] = request->GetBlob(output);

  std::string res = "";
  const char kPadSymbol = '#';
  double conf = 1.0;
  std::string kAlphabet = "0123456789abcdefghijklmnopqrstuvwxyz";
  kAlphabet.push_back(kPadSymbol);

  auto output_shape = blobs.begin()->second->getTensorDesc().getDims();

  for (int i = 0; i < getResultsLength(); i++) {
    
    auto ouput_data_pointer = blobs[output]->buffer().as<float *>();
    std::vector<float> output_data(ouput_data_pointer, ouput_data_pointer + output_shape[0] * output_shape[2]);
    res = CTCGreedyDecoder(output_data, kAlphabet, kPadSymbol, &conf);
    results_[i].result_value = res;

  }

  found_result = true;
  if (!found_result) results_.clear();
  return true;
}

Using this method the network output was:

image_4

So my pointer to output_data does not changed as expected. After that, I changed my fetchResults() method using the same structure of head_pose_detection example:

bool dynamic_vino_lib::TextRecognition::fetchResults() {

  bool can_fetch = dynamic_vino_lib::BaseInference::fetchResults();
  if (!can_fetch) {return false;}
  bool found_result = false;

  InferenceEngine::InferRequest::Ptr request = getEngine()->getRequest();

  std::string output = valid_model_->getOutputName();
 
  InferenceEngine::BlobMap blobs;

  blobs[output] = request->GetBlob(output);

  std::string res = "";
  const char kPadSymbol = '#';
  double conf = 1.0;
  std::string kAlphabet = "0123456789abcdefghijklmnopqrstuvwxyz";
  kAlphabet.push_back(kPadSymbol);

  auto output_shape = blobs.begin()->second->getTensorDesc().getDims();


  for (int i = 0; i < getResultsLength(); i++) {
   
    auto ouput_data_pointer = &blobs[output]->buffer().as<float *>()[i];

    std::vector<float> output_data(ouput_data_pointer, ouput_data_pointer + output_shape[0] * output_shape[2]);

    res = CTCGreedyDecoder(output_data, kAlphabet, kPadSymbol, &conf);

    results_[i].result_value = res;

    slog::info << "Result: " << res  << slog::endl;

    output_data.clear();

    ouput_data_pointer = 0;

    res = "";

  }

  found_result = true;

  if (!found_result) results_.clear();
  return true;
}

And the result:

image_5

My first pointer is correct but the second not. I tried to understand why but I'm not able. How can I correctly recognize the second text?

Thanks in advance.