hechth/fs-irods

object's full path should include zone name component

Opened this issue · 4 comments

Objects in the catalog are known and queried by logical path, ie the name,

ICAT=> select * from r_coll_main;
ICAT=> select coll_name from r_coll_main;
          coll_name          
-----------------------------
 /
 /tempZone
 /tempZone/home
 /tempZone/trash
 /tempZone/home/public
 /tempZone/trash/home/public
 /tempZone/trash/home/rods
 [...]

It seems, in keeping with this, that getting the path of a given collection,etc, via an iRODSFS should yield the same.
However, we see this instead:

>>> irfs = fs_irods.iRODSFS(s)
>>> for path,_,_ in irfs.walk(): print(path)
... 
/
/home
/trash
/home/alice
/home/public
/home/rods
/trash/home
/home/rods/dir
   [...]

The zone-named leading components of the paths are missing.

@hechth - I see this as centering on the interpretation of the path parameter in the getinfo method. Is there a compelling reason to choose the /<zone> collection as the relative reference point for that parameter? Especially because, once you authenticate yourself via the session parameter to iRODSFS.__init__, data objects from other zones might be accessible through that iRODS server session object, but unreachable from the iRODSFS object using the current scheme.

In the case that you find your application "prefers" the zone-relative interpretation you've chosen for paths, .... then
perhaps - rather than providing wrap() which currently moves us in the direction from relative path to absolute, we should have an unwrap( ) method instead. Its function would be to query for any zone that matches its absolute_logical_path parameter's leading path component, and remove that component in the output if a matching zone is found. it could always take a zone parameter too, which defaults to the session's home zone.

@trel ,does this seem logical to you as it does to me?

Agreed, not using the zone name in the path means visibility across federation is off the table.

There is also this line which actually examines the path's content (its leading element in particular) in order to tell whether or not it should be considered absolute. That probably shouldn't be so. It is after all technically possible for a /tempZone/tempZone collection to exist. So the true meaning of "/tempZone" at best is ambiguous, but in some cases could resolve incorrectly.

Because of this, we get e.g. the erroneous result (in a default iRODS install having zone name tempZone)

$ imkdir -p /tempZone/{someDir,tempZone}/subDir
$ ils  /tempZone/{someDir,tempZone}/subDir
/tempZone/someDir/subDir:
/tempZone/tempZone/subDir:
$ python
>>> import fs_irods, irods.test.helpers
>>> ses = irods.test.helpers.make_session()
>>> f = fs_irods.iRODSFS(ses)
<fs_irods.iRODSFS.iRODSFS object at 0x7fe2d86bfb50>
>>> f.isdir('/someDir/subDir')
True
>>> f.isdir('/tempZone/subDir')
False

Yet another reason why I see it as desirable to unify the fs-irods notion of what a logical path is, with that of iRODS itself. Any opinion on this, @hechth ?

Taking a fresh look at this issue, I'm wondering if it's been partly misguided, ... mainly because I think the spirit of PyFilesystem2 is to present like a mount point, at least when taking the example of its use with a plain POSIX OS. In other words thefs.osfs.OSFSclass takes a first initializer argument of a starting directory, and from then on, its internal "filesystem" is relative to that point in the overall Linux filesystem. So if a file ~/dust/wind/dude exists, then OSFS("~/dust").isfile("/wind/dude") evaluates as True, right?