Incorrect recognition with Text recognition model
ximenesfel opened this issue · 0 comments
ximenesfel commented
Hi,
I`m implementing a text recognition pipeline following this demo (https://github.com/opencv/open_model_zoo/tree/master/demos/text_detection_demo).
I'm using the following pipeline:
TextDetection
TextRecognition
My first fetchResults() method implemented was:
bool dynamic_vino_lib::TextRecognition::fetchResults() {
bool can_fetch = dynamic_vino_lib::BaseInference::fetchResults();
if (!can_fetch) {return false;}
bool found_result = false;
InferenceEngine::InferRequest::Ptr request = getEngine()->getRequest();
std::string output = valid_model_->getOutputName();
InferenceEngine::BlobMap blobs;
blobs[output] = request->GetBlob(output);
std::string res = "";
const char kPadSymbol = '#';
double conf = 1.0;
std::string kAlphabet = "0123456789abcdefghijklmnopqrstuvwxyz";
kAlphabet.push_back(kPadSymbol);
auto output_shape = blobs.begin()->second->getTensorDesc().getDims();
for (int i = 0; i < getResultsLength(); i++) {
auto ouput_data_pointer = blobs[output]->buffer().as<float *>();
std::vector<float> output_data(ouput_data_pointer, ouput_data_pointer + output_shape[0] * output_shape[2]);
res = CTCGreedyDecoder(output_data, kAlphabet, kPadSymbol, &conf);
results_[i].result_value = res;
}
found_result = true;
if (!found_result) results_.clear();
return true;
}
Using this method the network output was:
So my pointer to output_data does not changed as expected. After that, I changed my fetchResults() method using the same structure of head_pose_detection example:
bool dynamic_vino_lib::TextRecognition::fetchResults() {
bool can_fetch = dynamic_vino_lib::BaseInference::fetchResults();
if (!can_fetch) {return false;}
bool found_result = false;
InferenceEngine::InferRequest::Ptr request = getEngine()->getRequest();
std::string output = valid_model_->getOutputName();
InferenceEngine::BlobMap blobs;
blobs[output] = request->GetBlob(output);
std::string res = "";
const char kPadSymbol = '#';
double conf = 1.0;
std::string kAlphabet = "0123456789abcdefghijklmnopqrstuvwxyz";
kAlphabet.push_back(kPadSymbol);
auto output_shape = blobs.begin()->second->getTensorDesc().getDims();
for (int i = 0; i < getResultsLength(); i++) {
auto ouput_data_pointer = &blobs[output]->buffer().as<float *>()[i];
std::vector<float> output_data(ouput_data_pointer, ouput_data_pointer + output_shape[0] * output_shape[2]);
res = CTCGreedyDecoder(output_data, kAlphabet, kPadSymbol, &conf);
results_[i].result_value = res;
slog::info << "Result: " << res << slog::endl;
output_data.clear();
ouput_data_pointer = 0;
res = "";
}
found_result = true;
if (!found_result) results_.clear();
return true;
}
And the result:
My first pointer is correct but the second not. I tried to understand why but I'm not able. How can I correctly recognize the second text?
Thanks in advance.