Table Extractor From Image

This repository contains the code that extracts a table from an image and exports it to an Excel. To do this, the image is "read" by an OCR which provides a JSON output which is used as the input to the program. The program then arranges the cells row and column-wise as per the JSON input.

NOTE: Only those input cells read by the OCR will be displayed in the Excel.

Modules Required

os
copy
pandas==0.22.0
openpyxl==2.4.9
You can also use requirements.txt to install the packages. How? Follow this link.

Flow

Image -> JSON -> Excel

Steps

First of all, install all the import packages specified in the requirements.txt
For "reading" an image, use an OCR that converts the format to JSON.
- You can use your own OCR or use Microsoft Azure Cognitive Services OCR API.
- Or you can upload the image at their text reader demo. The demo will give you the JSON of the image. Save the JSON to a notepad and run the program.
In the program, change the input path and output path according to your requirement.
Run the program (JSON-to-Excel.py).

Sample Test Case

Input Image:

It's Corresponding JSON:

Excel Output:

rohanpillai20/Table-Extractor-From-Image

Table Extractor From Image

Modules Required

Flow

Steps

Sample Test Case