nlmatics/llmsherpa

Bug in load_data when using full path

yoeldk opened this issue · 2 comments

This code would fail:

full_path = 'C:\\temp\\A\\test.pdf'
documents = pdf_loader.load_data(full_path )

However, if relative path is given it works fine.

It looks like the issue is in file_reader.py:63
is_url = urlparse(path_or_url).scheme != ""

In case of full path the scheme will be the letter of the drive (C in this case) which would make it treat it as a URL instead of a path.

I am facing the same problem, did you find any workaround ?

you could just change the code and make it:

        is_url = urlparse(path_or_url).scheme != "" &&  len(urlparse(path_or_url).scheme) > 2