-
Attempts extracting tables from well-formed PDF files to a CSV.
-
PDF files containing images or more complex internal structures may not be properly converted.
- Python 3.11
- pipenv
# Clone
git clone git@github.com:rbento/pdf-table-to-csv.git
# or
git clone https://github.com/rbento/pdf-table-to-csv.git
# Change directory
cd pdf-table-to-csv
# Sync and activate the virtual environment
pipenv sync
pipenv shell
# Convert one file
python convert.py /path/to/file.pdf
# Convert multiple files
python convert.py /path/to/file1.pdf /path/to/file2.pdf
pipenv shell
(pdf-table-to-csv) ~/Workspace/pdf-table-to-csv $ python convert.py ~/Desktop/tax_slips.pdf ~/Desktop/stocks.py
Converting /Users/rbento/Desktop/tax_slips.pdf
> Converted to /Users/rbento/Desktop/tax_slips.csv
Skipping non-pdf source file /Users/rbento/Desktop/stocks.py