davendw49/sciparser

Unable to extract all images from PDF

Davidwhw opened this issue · 0 comments

Sorry to bring up the same issue as pdf_parser/issues/1#issue-2307687422, because I have not received a reply and urgently need a solution.

When I use the pdffigures2 backend to extract images from a PDF, there are often images that are overlooked. For example, pdf_parser extracts only 3 images from a PDF file that contains 5 images. (In fact, in my observation, pdffigures2 is the best of the three image extraction backends, cermine will cut a complete image into pieces.)
I guess maybe the pdffigures2 backend uses default parameters such as "image size" or "resolution" to filter the images?
Can you give me some advice or clues?
Thank you for your assistance.