evg-cv/PDFExtractor

Python

PDFExtractor

Overview

This project is to extract the necessary information from the pdf file and save them into the csv file. The PDFMiner and Pandas libraries are used in this project.

Structure

output

The directory to save the result csv files
src

The main source code to extract the necessary information and save into the csv file.
utils

The source code for management of files and folders of the project
app

The main execution file
requirements

All the dependencies for this project
settings

Several settings including the pdf file path

Installation

Environment

Ubuntu 18.04, Windows 10, Python 3.6
Dependency Installation

Please navigate to this project directory and run the following command in the terminal.
```
    pip3 install -r requirements.txt
```

Execution

Please set PDF_FILE_PATH variable in settings file with the full path of pdf file to process.
Please run the following command in the terminal.
```
    python3 app.py
```