This is a simple Node application that scrapes the text from all the pdf files in the data
directory and creates corresponding text files for each of them in the text
output folder. Can be useful for collecting machine learning data for later tokenization etc.
- Put the .pdf files into the
data
directory - Run
npm install
to get your dependencies. - Run
npm start
to convert the pdf files.
And that's it, I hope you find it helpful! :-)