Please make sure to have installed:
- opencv (2.4) according to node_opencv and set OPENCV_DIR and PATH=%OPENCV_DIR%\bin properly
- tesseract (latest) and set TESSDATA_PREFIX to /tessdata where you installed it
- cpdf (latest)
- ImageMagick + Legacy Tools (convert)
- install using npm install for a private github
var paperparser = require('paper-parser')
paperparser()
everytime you want to complete rebuild (because there's no checking for existing files soz);
Parses all PDFs in /input as IB Maths Papers
Dirty I know.
Doesn't work on some questions.
Tested to work on basically nothing.
Will output images in folder structure:
/static/year/group/subject/paper/language/timezone
Example:
__dirname/static/M13/5/MATME/SP2/ENG/TZ1