Convert a (badly) scanned PDF to formatted text.
- Use Automator to split the PDF (1 document per page)
- Use Automator to transform each PDF page into an image
- For each image, extract the text with tesseract.js
- Write a file with whatever got extracted
npm run detect
- Merge all files together
npm run merge
- Format to HTML
npm run format