pkp/ots

Switch to Grobid for front matter parsing of Word docs

axfelix opened this issue · 1 comments

Recent tests performed by eLife show that Grobid substantially outperforms Cermine for front-matter parsing on our Word document corpus -- our other decisions seem sound right now, and front matter parsing is less important in the context of OJS integration, but we should make some changes to our Merge module to reflect this.

Actually, I've tested this, and the performance isn't that much higher on real-world docs as it is on our corpus. I'm going to close for now but may revisit.