A series of scripts to extract sentences from Glossika PDF course files.
Right now, it's custom made for a triangulation package, specifically, English > German > Mandarin.
On Mac:
brew install poppler
In order:
- Extracts raw text from PDF
- Gets all sentences
- Extracts sentences by language
- Extracts IPA transcriptions