Turn pdf document into simple annotated XML for further processing in a corpus preparation pipeline.
Primary LanguageR