This small utility app was created to help with the tedious task of extracting data contained in tables of vendor PDF product data sheets.
Tabula has been used previously and is highly recommended, but I needed something that I could customise to my needs a little more.
- Python3
- Used to create the main application functionality
-
- Flask is a micro web framework written in Python.
-
- Camelot is a Python library that can help you extract tables from PDFs.
- VS Code
- Code Editor
The website was developed using VS Code & Git pushed to GitHub, which hosts the repository. I made the following steps to deploy the site:
Ensure the following are installed locally on your computer:
- Python 3.6 or higher
- PIP3 Python package installer
- Git Version Control
- Ghostscript Ghostscript is an interpreter for the PostScript® language and PDF files
- navigate to simonjvardy/python-pdf-table-extractor GitHub repository.
- Click the Code button
- Copy the clone url in the dropdown menu
- Using your favourite IDE open up your preferred terminal.
- Navigate to your desired file location.
Copy the following code and input it into your terminal to clone Sportswear-Online:
git clone https://github.com/simonjvardy/python-pdf-table-extractor.git
Note: The process may be different depending upon your own OS - please follow this Python help guide to understand how to create a virtual environment.
Run the following command in your terminal window:
pip install -r requirements.txt
- TODO
python app.py
- freeCodeCamp YouTube video Python automation tutorial.