tmbdev/clstm

Batched OCR training?

jbaiter opened this issue · 4 comments

Currently the CLSTMOCR class that is defined in clstmhl.h can only train on single line images. Due to this, optimizations like Eigen's multi-threaded tensor operations and the GPU support have little effect, since the task size for single samples is too small for them to make a difference.

From a cursory reading of the code I could gather that batched training is supported by the lower-level API, so my question is what would have to be done to have batched training for the high-level CLSTMOCR (and ideally CLSTMText as well) API?

There are a bunch of different reasons. The code was ported from Python, where batching wouldn't have helped with speed, so it was easiest and safest to leave it as is. In addition, for other networks, batching tends to result in higher test set error rates, so there wasn't much motivation to add it (it seems to have no effect either way on error rates for LSTMs). Eigen Matrix also didn't support GPU, so there wasn't much motivation for that anyway.

Now that the code uses Eigen Tensor, batching would make more sense. But Eigen turns out not to be such a convenient framework for multicore or GPU computations anyway. In addition, LSTMs on GPUs are probably best implemented using fused kernels implementing multiple time steps at a time.

So, the upshot is that I've started working on a separate project for OCR similar to clstm, but with a focus on parallelization and GPU support, including support for batching.

i was sufferring from the speed of clstm trainning.how could community contribute to the new project

Give me a few weeks; I just moved from Google to NVIDIA. GPU support will be much better now :-)

Tom,
Good luck with the new job!