LayoutPDFReader._parse_pdf returns error when pdf contains empty pages
aleksvercau opened this issue · 9 comments
I tried processing a pdf file using the LayoutPDFReader.read_pdf()
method, but got a KeyError
for response_json['return_dict']['result']['blocks']
, since the response did not contain results, because there was an error (on a side node: would be nice to have a specific error in this case instead of a key error, clearly stating that the file could not be processed and the reason why).
I split my pdf in pages and processed each page separately to understand what the issue was. Turns out that the error existed every time an empty page was being processed. I am not sure whether this is the case for empty pages of all types of pdfs or just for some pdf types (there are small differences between text pdfs depending on how they were created). It only occurred on one of the pdfs I was processing, but it was also the only pdf with empty pages...
Better: do not fail processing of a whole document if it has one empty page, but simply skip that page.
I am facing issue too
me too. any intelligent fixes so far?
I am facing the same issue
i also have this issue
Same issue, anybody resolved this yet?
This error came to be when I was using the Docker image of the llm sherpa latest upon changing the version error is resolved ,
Use this docker image version instead of latest and try
docker pull ghcr.io/nlmatics/nlm-ingestor:v0.1.6
For running
docker run -p 5010:5001 ghcr.io/nlmatics/nlm-ingestor:v0.1.6
Hello @madhuprakash19 I am facing same issue. Is there any suggestion to solve it?
Hello @madhuprakash19 I am facing same issue. Is there any suggestion to solve it?
using different version of Docker image resolved for me.
v0.1.6
Thanks, I Tried v0.1.5 to latest version but It didn't work me