This project is to extract the necessary information from the pdf file and save them into the csv file. The PDFMiner and Pandas libraries are used in this project.
-
output
The directory to save the result csv files
-
src
The main source code to extract the necessary information and save into the csv file.
-
utils
The source code for management of files and folders of the project
-
app
The main execution file
-
requirements
All the dependencies for this project
-
settings
Several settings including the pdf file path
-
Environment
Ubuntu 18.04, Windows 10, Python 3.6
-
Dependency Installation
Please navigate to this project directory and run the following command in the terminal.
pip3 install -r requirements.txt
-
Please set PDF_FILE_PATH variable in settings file with the full path of pdf file to process.
-
Please run the following command in the terminal.
python3 app.py