Batch PDF Text Scraper

This is a simple Node application that scrapes the text from all the pdf files in the data directory and creates corresponding text files for each of them in the text output folder. Can be useful for collecting machine learning data for later tokenization etc.

How to use

  1. Put the .pdf files into the data directory
  2. Run npm install to get your dependencies.
  3. Run npm start to convert the pdf files.

And that's it, I hope you find it helpful! :-)