absolute path Issue in read_pdf method

Question

absolute path Issue in read_pdf method

Nishant-Bansal-777 opened this issue 7 months ago · 2 comments

Nishant-Bansal-777 commented 7 months ago

Hey,
I am using llmsherpa to parse pdf's. I have noticed that if I provide full path of pdf then there is a value for "is_url" variable and it starts downloading the pdf instead of going to else block.
is_url = urlparse(path_or_url).scheme != "" (in file_reader.py line 63)
for eg: pdf_path = 'E:\all_projects\pdf\first.pdf'
(is_url value is 'e')
To avoid this i am keeping pdf's in project directory.

Answer 1 · 2024-06-11T10:39:06.000Z

Do we have any solution for this yet.

Answer 2 · 2024-09-18T06:08:39.000Z

Do we have any solution for this yet.

I use this solution:

        with open(file_path, 'rb') as f:
            contents = f.read()
    sherpa_document = pdf_reader.read_pdf(file_path, contents)