thundergolfer-old/PyGrobid

Investigate potential issue with highlighted PDF file

Opened this issue · 1 comments

I had a pdf paper that I'd highlighted (Attention, Intentions, and the Structure of Discourse) and I tried to process it with Grobid through PyGrobid, and got the following exception:

Py4JJavaError: An error occurred while calling o5.processHeader.
: org.grobid.core.exceptions.GrobidException: [NO_BLOCKS] PDF parsing resulted in empty content

I have guessed that this is related to the highlighting because I proceeded to pass in an unhighlighted paper and it worked.

In any case, there's some issue with passing in certain PDF files, so I should investigate it.

The call from python was:

g._grobid_engine.processHeader('mypdffile.pdf', False, bibItem)

with mypdffile.pdf being the Attention, Intentions, and the Structure of Discourse paper.