ckorzen/pdf-text-extraction-benchmark

A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.

TeXMIT