Bash script to convert a bunch of PDF's in a folder into a bunch concatenated JPG's of a given height
requires Imagemagick (available on pip on OSX or apt-get on linux)
cd /path/to/script
follow prompts
3 folders created in the original dataset location:
original pdfs:
pagewise jpgs:
concatenated jpgs:
Script to convert a bunch of PDF's into a bunch extracted .TXT's
requires pdf2text (brew install poppler on OSX)
cd /path/to/script
follow prompts
1 folders created in the destination location:
converted txt's:
additional scripts if needed