LayoutPDFReader._parse_pdf returns error when pdf contains empty pages

Question

LayoutPDFReader._parse_pdf returns error when pdf contains empty pages

aleksvercau opened this issue 10 months ago · 9 comments

I tried processing a pdf file using the LayoutPDFReader.read_pdf() method, but got a KeyError for response_json['return_dict']['result']['blocks'], since the response did not contain results, because there was an error (on a side node: would be nice to have a specific error in this case instead of a key error, clearly stating that the file could not be processed and the reason why).

I split my pdf in pages and processed each page separately to understand what the issue was. Turns out that the error existed every time an empty page was being processed. I am not sure whether this is the case for empty pages of all types of pdfs or just for some pdf types (there are small differences between text pdfs depending on how they were created). It only occurred on one of the pdfs I was processing, but it was also the only pdf with empty pages...

Better: do not fail processing of a whole document if it has one empty page, but simply skip that page.

Answer 1 · 2024-03-25T16:10:48.000Z

I am facing issue too

Answer 2 · 2024-04-23T22:54:18.000Z

me too. any intelligent fixes so far?

Answer 3 · 2024-07-04T04:38:49.000Z

I am facing the same issue

Answer 4 · 2024-07-12T08:04:12.000Z

i also have this issue

Answer 5 · 2024-07-16T13:58:02.000Z

Same issue, anybody resolved this yet?

Answer 6 · 2024-07-17T09:25:39.000Z

This error came to be when I was using the Docker image of the llm sherpa latest upon changing the version error is resolved ,
Use this docker image version instead of latest and try
docker pull ghcr.io/nlmatics/nlm-ingestor:v0.1.6
For running
docker run -p 5010:5001 ghcr.io/nlmatics/nlm-ingestor:v0.1.6

Answer 7 · 2024-07-19T11:48:13.000Z

Hello @madhuprakash19 I am facing same issue. Is there any suggestion to solve it?

Answer 8 · 2024-07-19T11:51:20.000Z

Hello @madhuprakash19 I am facing same issue. Is there any suggestion to solve it?

using different version of Docker image resolved for me.

Answer 9 · 2024-07-19T12:43:54.000Z

v0.1.6

Thanks, I Tried v0.1.5 to latest version but It didn't work me