/pdf2complextable

Script to extract a complex table (here containing multiple lines per cell) from a pdf

Primary LanguageRMIT LicenseMIT

pdf2complextable

Script to extract a complex table (here containing multiple lines per cell) from a pdf

Feel free to use this as a template, feel even freer to help me convert it into a package :) I'd like to refashion the functions more generically, and add in options for different heuristics to determine cols and rows. I'll add to it when possible, but it isn't my highest priority.

Hope you find it useful. If you do, please be nice and try to cite all the R packages (or at the very least, the key ones contributed by plebs) you use in the main text of every manuscript you submit :)

UPDATE: there are now two versions, minor progress towards building this as a package...

  1. pdf2complextable.R - full script with functions embedded into the script
  2. pdf2complextable_functions.R - functions to be sourced in pdf2complextable_example_script_sourcing_functions.R

...update 7th May 2021: added in a more complex example and updated some of the functions. The original file (pdf2complextable.R) is NOT updated with these, only the version with sourced functions (pdf2complextable_example_script_sourcing_functions.R and the accompanying pdf2complextable_functions.R). Have fun!