This project was developed to make a neural network which recognizes tables inside documents. I needed an "intelligent" ocr for work, which could automatically recognize tables to treat them separately.
The project uses the pre-trained neural network offered by Tensorflow. In addition, a config file was used, according to the choosen pre-trained model, to train with object detections tensorflow API
The datasets was taken from:
- ICDAR 2017 POD Competition.
- Not already implemented:
- UNLV dataset with its own ground truth;
- Marmot Dataset
Before we go on make sure you have everything installed to be able to use the project:
- Python 3
- Tensorflow (tested on r1.8)
- Its object-detection API (remember to install COCO API. If you are on Windows see at the bottom of the readme)
- Pillow
- opencv-python
- pandas
- pyprind (useful for process bars)
The project is made up of different parts that acts together as a pipeline.
I have prepared two "costants" files: dataset_costants.py
and inference_constants.py
.
The first contains all those costants that are useful to use to create dataset, the second to make
inference with the frozen graph. If you just want to run the project you should modify only those two files.
Since colors are not useful for table detection, we can convert all the images in .jpeg
8-bit single channel images.
This)
transformation is still under testing.
Use python dataset/img_to_jpeg.py
after setting dataset_costants.py
:
DPI_EXTRACTION
: output quality of the images;PATH_TO_IMAGES
: path/to/datase/images;IMAGES_EXTENSION
: extension of the extracted images. The only one tested is.jpeg
.
The dataset was take from
ICDAR 2017 POD Competition
. It comes with a xml
notation file with formulas, images and tables per image.
Tensorflow instead can build its own TFRecord from csv informations, so we need to convert
the xml
files into a csv
one.
Use python dataset/generate_database_csv.py
to do this conversion after setting dataset_costants.py
:
TRAIN_CSV_NAME
: name for.csv
train output file;TEST_CSV_NAME
: name for.csv
test output file;TRAIN_CSV_TO_PATH
: folder path forTRAIN_CSV_NAME
;TEST_CSV_TO_PATH
: folder path forTEST_CSV_NAME
;ANNOTATIONS_EXTENSION
: extension of annotations. In our case is.xml
;TRAINING_PERCENTAGE
: percentage of images for trainingTEST_PERCENTAGE
: percentage of images for testingTABLE_DICT
: dictionary for data labels. For this project there is no reason to change it;MIN_WIDTH_BOX
,MIN_HEIGHT_BOX
: minimum dimension to consider a box valid; Some networks don't digest well little boxes, so I put this check.
csv
files and images are ready: now we need to create our TF record file to feed Tensorflow.
Use python generate_tf_records.py
to create the train and test.record
files that we will need later. No need to configure
dataset_costants.py
Inside trained_models
there are some folders. In each one there are two files, a .config
and a .txt
one.
The first contains a tensorflow configuration, that has to be personalized:
fine_tune_checkpoint
: path to the frozen graph from pre-trained tensorflow models networks;tf_record_input_reader
: path to thetrain.record
andtest.record
file we created before;label_map_path
: path to the labels of your dataset.
The latter contains the command to launch from tensorflow/models/research/object-detection
and follows this pattern:
python model_main.py \
--pipeline_config_path=path/to/your_config_file.config \
--model_dir=here/we/save/our/model" \
--num_train_steps=num_of_iterations \
--alsologtostderr
Other options are inside tensorflow/models/research/object-detection/model_main.py
When the net has finished the training, you can export a frozen graph to make inference.
Tensorflow offers the utility: from tensorflow/models/research/object-detection
run:
python export_inference_graph.py \
--input_type=image_tensor \
--pipeline_config_path=path/to/automatically/created/pipeline.config \
--trained_checkpoint_prefix=path/to/last/model.ckpt-xxx \
--output_directory=path/to/output/dir
Now that you have your graph you can try it out:
Run inference_with_net.py
and set inference_costants.py
:
PATHS_TO_TEST_IMAGE
: path list to all the test images;BMP_IMAGE_TEST_TO_PATH
: path to which save test output files;PATHS_TO_LABELS
: path to.pbtxt
label file;MAX_NUM_BOXES
: max number of boxes to be considered;MIN_SCORE
: minimum score of boxes to be considered;
Then it will be generated a result image for every combination of:
PATHS_TO_CKPTS
: list path to all frozen graph you want to test;
In addition it will print a "merged" version of the boxes, in which
all the best vertically overlapping boxes are merged together to gain accuracy. TEST_SCORES
is a list of
numbers that tells the program which scores must be merged together.
The procedure is better described in inference_with_net.py
.
For every execution a .log
file will be produced.
This comment will probably solve your problem.
This clone will provide a working source for COCO API in Windows and Python3