cldf/csvw

Link.resolve has issues

Closed this issue · 6 comments

The functionality of Link.resolve duplicates to some extent functionality that could be provided by urllib.parse.urljoin in the standard library. (cf. cldf/cookbook#11).

The important addition is the handling of Path objects, but

if isinstance(base, pathlib.Path):

is an explicit instance check and the pathlib.Path is likely to be imported from pathlib2, which leads to huge confusion when using a pathlib.Path object instead. Some duck typing solution might be more useful and less confusing.

AttributeError: 'PosixPath' object has no attribute 'endswith'
>>> import pdb
>>> pdb.pm()
> /home/gereon/devel/csvw/src/csvw/metadata.py(102)resolve()
-> if not base.endswith('/'):
(Pdb) l
 97  	    def resolve(self, base):
 98  	        if not base:
 99  	            return self.string
100  	        if isinstance(base, pathlib.Path):
101  	            return base / self.string
102  ->	        if not base.endswith('/'):
103  	            base += '/'
104  	        return base + self.string
105  	
106  	
107  	def link_property():
(Pdb) base
PosixPath('/home/gereon/databases/lexirumah-data/cldf')
(Pdb) isinstance(base, pathlib.Path)
False
(Pdb) isinstance(base, pathlib.PosixPath)
False
(Pdb) type(base)
<class 'pathlib.PosixPath'>

is just very confusing.

Hm. Do you have any suggestions how to best detect a Path? Maybe we should just handle the isinstance(base, str) case first and for the remainder just assume it's a Path?

Duck typing? Asking for forgiveness? I chose .joinpath() instead of / because it might be more transparent in case something really strange accidentally becomes base.

try:
  return base.joinpath(self.string)
except AttributeError:
  urllib.parse.urljoin(base, self.string)

Ok, joinpath should be innocent enough to discriminate between strings and paths.

xflr6 commented

In Python 3.6 there is fspath and os.PathLike:

Libraries wishing to support path objects and a version of Python prior to Python 3.6 and the existence of os.fspath() can use the idiom of path.fspath() if hasattr(path, "fspath") else path.

On PY2 though only pathlib2 and not pathlib seems to have __fspath__

Whatever happened to my suggestion to use clean existing URL logic in the form of urllib.parse.urljoin(base, self.string)? Did that get explicitly rejected for good reason, or did it just get neglected?

Just neglected, I guess; or rather shadowed by the issue of how to detect Path objects.