/Text_Recognition

Diploma Thesis - Nowadays, when people are more reliant on technology than ever, the capability to search information with the use of smart phones is becoming essential. For this reason, we believe that reverse image search and, especially, their text detection and recognition function is a search engine tool that needs to be further advanced. In the current Thesis we assess two existent NNs, one on text detection and another on text recognition, on multilingual scene text images. The first NN is the EAST text detector, which performs considerably well on locating text in scene text images. For text recognition, we decided to train an end-to-end trainable NN, the CRNN, first on a Greek and English alphabet and, then, on an alphabet containing letters from four Latin languages, digits and symbols. Specifically, we emphasized on all of the training parameters and hyperparameters to achieve the best possible accuracy of correct character prediction, which is 84.6%. Finally, we integrated the two NNs into one system that detects scene text on images and recognizes the words depicted. This system has, additionally, been optimized in order to execute in real time, explicitly at 0.32 seconds per image.

Primary LanguagePython

Text_Recognition

Diploma Thesis - Nowadays, when people are more reliant on technology than ever, the capability to search information with the use of smart phones is becoming essential. For this reason, we believe that reverse image search and, especially, their text detection and recognition function is a search engine tool that needs to be further advanced. In the current Thesis we assess two existent NNs, one on text detection and another on text recognition, on multilingual scene text images. The first NN is the EAST text detector, which performs considerably well on locating text in scene text images. For text recognition, we decided to train an end-to-end trainable NN, the CRNN, first on a Greek and English alphabet and, then, on an alphabet containing letters from four Latin languages, digits and symbols. Specifically, we emphasized on all of the training parameters and hyperparameters to achieve the best possible accuracy of correct character prediction, which is 84.6%. Finally, we integrated the two NNs into one system that detects scene text on images and recognizes the words depicted. This system has, additionally, been optimized in order to execute in real time, explicitly at 0.32 seconds per image.