Investigate potential issue with highlighted PDF file
Opened this issue · 1 comments
I had a pdf paper that I'd highlighted (Attention, Intentions, and the Structure of Discourse) and I tried to process it with Grobid through PyGrobid, and got the following exception:
Py4JJavaError: An error occurred while calling o5.processHeader.
: org.grobid.core.exceptions.GrobidException: [NO_BLOCKS] PDF parsing resulted in empty content
I have guessed that this is related to the highlighting because I proceeded to pass in an unhighlighted paper and it worked.
In any case, there's some issue with passing in certain PDF files, so I should investigate it.
The call from python was:
g._grobid_engine.processHeader('mypdffile.pdf', False, bibItem)
with mypdffile.pdf
being the Attention, Intentions, and the Structure of Discourse paper.