/svpgm2txt

Quick OCR to extract perfect formed text from PGM files instead of use gocr to get less error when working on perfect files. This is to fix issues of default pgm2txt command which require lot of boring manual fixes in final document.

Primary LanguageC++OtherNOASSERTION

######################################################
#            PROJECT  : svpgm2txt                    #
#            VERSION  : 1.2                          #
#            DATE     : 08/2011                      #
#            AUTHOR   : Valat SĂ©bastien              #
#            LICENSE  : CeCILL-C                     #
######################################################

To compile the project :

$> mkdir build
$> cd build
$> cmake ..
$> make
$> make install

                    -----------------------

You can change the prefix path for installation with the command : 

$> cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local/myprefix

                    -----------------------
                    
To execute the short test suite, you can use :

$> make test

                    -----------------------
                    
Some usage case :

1) Extract text from PGM without using character DB  (will ask for all new chars) :

$> svpgm2txt test-data-ascii.pgm

2) Save the DB generated by first usage :

$> svpgm2txt -o db.txt test-data-ascii.pgm

3) Reuse the DB without saving new entries :

$> svpgm2txt -d db.txt test-data-ascii.pgm

4) Reuse DB and complete it by saving after each picture file :

$> svpgm2txt -s -o db.txt -d db.txt test-data-ascii.pgm

5) Convert multiple files :

$> svpgm2txt -d db.txt *.pgm

6) Use aspell to fix I/l bug due to identical shape depending on the font :

$> svpgm2txt -U -i aspell -L en_US *.pgm