Bug in load_data when using full path

Question

yoeldk opened this issue 10 months ago · 2 comments

This code would fail:

full_path = 'C:\\temp\\A\\test.pdf'
documents = pdf_loader.load_data(full_path )

However, if relative path is given it works fine.

It looks like the issue is in file_reader.py:63
is_url = urlparse(path_or_url).scheme != ""

In case of full path the scheme will be the letter of the drive (C in this case) which would make it treat it as a URL instead of a path.

Answer 1 · 2024-03-19T10:26:08.000Z

I am facing the same problem, did you find any workaround ?

Answer 2 · 2024-07-23T08:54:47.000Z

you could just change the code and make it:

        is_url = urlparse(path_or_url).scheme != "" &&  len(urlparse(path_or_url).scheme) > 2