absolute path Issue in read_pdf method
Nishant-Bansal-777 opened this issue · 2 comments
Nishant-Bansal-777 commented
Hey,
I am using llmsherpa to parse pdf's. I have noticed that if I provide full path of pdf then there is a value for "is_url" variable and it starts downloading the pdf instead of going to else block.
is_url = urlparse(path_or_url).scheme != ""
(in file_reader.py line 63)
for eg: pdf_path = 'E:\all_projects\pdf\first.pdf'
(is_url value is 'e')
To avoid this i am keeping pdf's in project directory.
MeghaWalia-eco commented
Do we have any solution for this yet.
fiksii-copilot commented
Do we have any solution for this yet.
I use this solution:
with open(file_path, 'rb') as f:
contents = f.read()
sherpa_document = pdf_reader.read_pdf(file_path, contents)