Deep Learning based Table Detection (LUMINOTH)

This project focuses on "Detection Tables in PDF and Extract contents" by Keras and ObjectTensorFlow Detection API.

The system shall work in 2 steps:

Step 1: Accept document input, read tables: System should have an input mechanism for accepting documents images (TIFF, JPEG). The document may have one or more tables.

Step 2: Step 2: As an output, system should return the table content in an excel format,same as that in the sample data sets

More Details can be found in PPT

Dataset used UNLV dataset

Quick Demo

https://youtu.be/cwIQlJRHuA4

THE DEVELOPING IS ON PROGRESS! THE REPO WILL BE UPDATED SOON, !

Architecture

Result 2

Installation

Luminoth currently supports Python 2.7 and 3.4–3.6.

Pre-requisites

To use Luminoth, TensorFlow must be installed beforehand. If you want GPU support, you should install the GPU version of TensorFlow with pip install tensorflow-gpu, or else you can use the CPU version using pip install tensorflow.

Installing Luminoth

Just install from PyPI:

pip install luminoth

Optionally, Luminoth can also install TensorFlow for you if you install it with pip install luminoth[tf] or pip install luminoth[tf-gpu], depending on the version of TensorFlow you wish to use.

Google Cloud

If you wish to train using Google Cloud ML Engine, the optional dependencies must be installed:

pip install luminoth[gcloud]

Installing from source

First, clone the repo on your machine and then install with pip:

git clone https://github.com/tryolabs/luminoth.git
cd luminoth
pip install -e .

Run "run.py"

License

This system is available under the MIT license. See the LICENSE file for more info.

dragie/Table-Detection-using-Deep-learning