extractor is a tool to extract table from pdf and create excel out of it.
please install 2 dependencies.
- Tkinter
- ghostscript
For ubuntu
apt install python3-tk ghostscript
Or For Centos
yum install tkinter ghostscript
$ virtualenv ENV
$ source ENV/bin/activate
$ git clone https://github.com/Ankit-rana/table_extractor.git
$ cd table_extractor
$ pip3 install -r requirements.txt
$ python3 setup.py install
$ extractor sample.pdf
$ ls
foo.xlsx
- extractor also provides ways to configure it. you can find configuration in /etc/extractor.conf
- take a look at the sample configuration file
[DEFAULT]
START_FIELD_NAME=Booking Date
DATEFIELDS=0,1,3
AMOUNTFIELDS=4,5,6