Have you ever got nightmare from converting tables in old literature pdf files to excel files? Yes? Here is a tiny script you can simply convert these nightmare tables to excels with few efforts.
- Python > 3.7 (tested with python 3.10 but should work with the latest
tabula
andtabulate
pkgs) tabula
packages, can be installed in conda env via:
conda install -c conda-forge tabula-py
tabulate
package, install via:
conda install -c conda-forge tabulate
pandas
andnumpy
Run from the example notebook or run the modified script in your directory, with command line or IDE execution:
python pdfToTable.py
A built-in function is provided to convert string to float data, also extract standard errors in the tabula (if given)
A breif example notebook is given in the example folder. with output files for benchmark, example test pdf can be download from here or Zhang et al. 2018 Lithos: 300–301 (2018) 20–32.
Send me email if you ran into any problem.