/lstm-ctc-ocr

using rnn (lstm or gru) and ctc to convert line image into text, based on torch7 and warp-ctc

Primary LanguageLuaApache License 2.0Apache-2.0

LSTM-CTC-OCR Toy experiment

The project is just a toy experiment trying to apply CTC and LSTM for OCR problem, however, I only succeed in 20-digits recognition while longer context text is still hard to train. I may or may not pick up this project in the future. So basically, this is a project for summary.

The trend of line recognition

Recognizing lines of unconstrained text from images has always suffered from segmentation problems, which requires carefully designed character segmentation methods and heuristic tuning of the cost functions. However, due to the develpment of Recurrent Neural Network, espectially LSTM(Long-Short-Term-Memory) and GRU(Gated Recurrent Unit), it is a trend to recognize the whole line for a time and output line text from end to end.

CTC, Connectionist Temporal Classfication

CTC, which was deviced by Alex Grave in 2006, is essentially a kind of loss function. For temporal classification tasks and sequence labelling problems, the alignment between the inputs and outputs is unknown, so we need CTC loss function to measure the distance between softmax activation and groundtrue label.

Baidu Research had implemented a fast parallel version of CTC, along with bindings for Torch, refer to this README for more information about CTC and warp-ctc.

Origin Reference

Application of CTC

Alex Graves developed CTC and used it to speech recognition and handwriting recognition. Some researchers continued his works, like project ocropy, paragraph recognition, [this version] (https://arxiv.org/abs/1604.08352), and online seq learning

You can also refer to [Recursive Recurrent Nets with Attention Modeling for OCR in the Wild] (http://arxiv.org/abs/1603.03101) to compare these two modern different architectures.


You can ⭐ this project if you like it.