Support cloudpathlib
ejm714 opened this issue · 3 comments
Make pandas_path place nicely with cloudpathlib so we can do things like
bucket = S3Path("s3://drivendata-competition-sfp-public/")
df['s3_path'] = bucket / df.filename.path.with_suffix('.tif')
Thoughts on the API?
We could add a .cloud_path
(or .cloud
or .cpath
or something else) accessor to explicitly do CloudPath
things.
Or, we can overload the .path
accessor to just handle CloudPaths. We'd do this by trying to instantiate everything as CloudPath
first (and if that fails, because the protocol fails—i.e., it doesn't start with s3://
, az://
, etc.) instantiate it as aPath
object.
In order not to muck with the underlying data types, we convert to Path
when the accessor is hit (this is where we'd need some changes):
pandas-path/pandas_path/accessor.py
Line 35 in 8b7d147
We also check isinstance
Path
in places, and should use os.PathLike
instead, which both objects will support.
And then we return everything as a str
so you can continue to do normal pandas
things with a string type. This should be ok, but worth thinking about:
pandas-path/pandas_path/accessor.py
Line 68 in 8b7d147
pandas-path/pandas_path/accessor.py
Line 95 in 8b7d147
Dependency-wise, it might be easier to think about if it went the other way: cloudpathlib has an (optional?) implementation of pandas-path.
Ah, something like from cloudpathlib.pandas import cloud_path
to register the accessor?