Switch to Grobid for front matter parsing of Word docs
axfelix opened this issue · 1 comments
axfelix commented
Recent tests performed by eLife show that Grobid substantially outperforms Cermine for front-matter parsing on our Word document corpus -- our other decisions seem sound right now, and front matter parsing is less important in the context of OJS integration, but we should make some changes to our Merge module to reflect this.
axfelix commented
Actually, I've tested this, and the performance isn't that much higher on real-world docs as it is on our corpus. I'm going to close for now but may revisit.