nlmatics/llmsherpa

absolute path Issue in read_pdf method

Nishant-Bansal-777 opened this issue · 2 comments

Hey,
I am using llmsherpa to parse pdf's. I have noticed that if I provide full path of pdf then there is a value for "is_url" variable and it starts downloading the pdf instead of going to else block.
is_url = urlparse(path_or_url).scheme != "" (in file_reader.py line 63)
for eg: pdf_path = 'E:\all_projects\pdf\first.pdf'
(is_url value is 'e')
To avoid this i am keeping pdf's in project directory.

Do we have any solution for this yet.

Do we have any solution for this yet.

I use this solution:

        with open(file_path, 'rb') as f:
            contents = f.read()
    sherpa_document = pdf_reader.read_pdf(file_path, contents)