Improve `storage.*:/path` description

Question

Improve `storage.*:/path` description

Closed this issue 6 months ago · 6 comments

From current profile it is not entirely clear how to thread $PATH restriction from storage.*: scopes. Documentation should more precisely describe if e.g. storage.create:/foo is sufficient to create file or directory with a name /foobar or we should consider that defined path can mean exactly /foo or any file or directory below /foo base path like /foo/whatever but not /foobar.

Personally I would treat $PATH as simple string prefix, but in that case policies defined for storage scope should always use trailing /, e.g. scope.create:/atlasdatadisk/ and to avoid mistakes documentation should make it clear trailing / is recommended in policy definitions.

It would be also nice to make more clear which directories can be created with storage.create capability to avoid different behavior we observe with SE-tokens used for HTTP-TPC, see DMC-1282

Answer 1 · 2022-08-19T13:50:34.000Z

I think it MUST be interpreted as the whole path component or we'll run into all kinds of problems. Why would the capability to create a file /somepath/foo also allow one to create a file /somepath/foobar ?! There would be no way to restrict the capability to the former only. For directories you could indeed still add a trailing / but for files that won't work. I think it's much clearer and intuitively understandable if it is a full path component.
For a directory I think it means the capability to run the equivalent of a mkdir -p PATH plus the capability to create subdirectories if the target is a directory (the latter is reasonable to have implicitly as it matches common POSIX behaviour. the former makes live a lot easier).
I think the profile is indeed not correct in case the target is a file: in that case subdirectories obviously don't make sense.
Concerning DMC-1282: I think the behaviour of dCache is reasonable and matches the profile.

Answer 2 · 2022-08-26T11:15:54.000Z

@vokac Personally, I do not see anything in the document to support your interpretation that $PATH is a simple substring; e.g., that the authz statement storage.create:/foo allows the client to create /foobar. The "simple substring" idea also doesn't match my understanding of the document's intention. It also doesn't match the provided examples (something like storage.create:/foo allows creation of an object in /foo/bar)

That said, there is (AFAIK) nothing that prohibits that interpretation and I agree that this is something that should be clarified.

Concerning DMC-1282, the problem was due to a separate ambiguity in the current document and how dCache tackled that ambiguity. To give a concrete example, consider the authz statement storage.create:/foo/bar where the directory /foo exists but has no child element bar. Does the authz statement allow the client to:

create a file bar in directory /foo?
create a subdirectory bar in directory /foo (and subsequently create unlimited content within that directory)?
create either a file bar or a directory bar in /foo?

In dCache's case, given the ambiguity, it opted for the "safer" (more limited) option: if /foo exists but /foo/bar does not exist then the statement storage.create:/foo/bar authorises the client to create a file bar in the directory /foo. An attempt to create the directory /foo/bar will fail.

On a related note: if /foo does not exist then the authz statement storage.create:/foo/bar authorises the client to create the parent directory (i.e., /foo), but not a file /foo. This is true for any ancestor directory of the final path element. So, something like mkdir -p works as expected, if the path doesn't exist and one interprets the path as pointing to a file.

The problem DMC-1282 arose because, at that time, FTS would create a token authorising each operation that was limited to that operation. For example, when attempting to upload a file /foo/bar/baz, FTS would create:

a token with storage.create:/foo for the MKCOL /foo request,
a token with storage.create:/foo/bar for the MKCOL /foo/bar request,
a token with storage.create:/foo/bar/baz" for the HTTP-TPC transfer that creates the file /foo/bar/baz`.

The problem was resolved by FTS creating a single token with storage.create:/foo/bar/baz and using this for all operations. In addition to working with dCache, it was safer (no unnecessarily broad tokens) and placed less load on the OP.

So, it may be reasonable to update the specification so it allows authz statements with a trailing slash (e.g., storage.create:/foo/bar/). This would allow storage to distinguish the intention, should the final path element not exist -- provided this is a use-case we need to support.

Answer 3 · 2022-08-28T22:21:45.000Z

@paulmillar unfortunately dCache and XRootD/StoRM implementations differs

implemented simple string matching without taking into account additional logic related to the directory structure
make usage of different storage implementations complex / prone to mistakes with security implications
leads to mistakes in client implementations and developers spend time to be compatible with different storage backends

Even though dCache comes with clever / "safer" solutions this doesn't make me completely happy, because it is difficult to operate distributed infrastructure with storage backends where each provides their own "features". I suggested simple string matching, because whole concept of organizing files in the directories doesn't seems to me very useful while our data use multiple metadata attributes. May be I'm too biased by our Rucio usage and default directory structure like rucio/scope/ff/ff/filename which is used just as a workaround to deal with poor storage implementation that can't deal with millions of files in a directory (directories can even make thing more complicated). Object storages seems to be quite popular even though they don't natively support concept of directories ... why should we put logic for directories in our tokens while this concept is not even natively supported by some storage implementations?

Despite of my preferences it is not really important if we decide to use XRootD/StoRM or dCache rules for storage path matching, but all implementation that supports WLCG JWT profile should provide same behavior which should be tested by our storage compliance testsuite.

Answer 4 · 2022-08-29T09:03:00.000Z

Hi @vokac

Many of the points you raise provide good examples of why we have specifications: that clients can code against well-defined expected behaviour. Server implementations must implement the expected behaviour, or they are simply non-conforming (i.e., "broken").

At the risk of pointing out what you already know: there is a balance here. In general, the profile should leave unspecified behaviour that the client doesn't care about (e.g., internal details, unnecessary use-cases). The JWT profile must not dictate how server code is written, but focus on client-server interactions.

My understanding is that there are two main use-cases for storage.create authorisation: a token could allow a client to upload a single file (at $PATH), or it could allow the client to upload multiple files within some common parent directory, which might be /. There is no use-case (that I am aware of) where the token should authorize a client to upload multiple files with the same prefix; e.g., a token that authorizes the upload of /foo/bar-1.root, /foo/bar-2.root, /foo/bar-3.root but not /foo/baz.root.

In the first case, $PATH is the path of the target file. In the second case $PATH is the common parent directory.

To be honest, I find the current description of storage.create is a bit of a mess (perhaps suffering from "design by committee"). I think this section should be rewritten to make it clear what operations are permitted.

On your specific points:

Rucio: yes, the default strategy for deterministic RSEs use a hash structure. Under this, I think is only makes sense if the token authorises uploading a single file, or it allows uploading any set of files within the RSE. If the future sees the currently anticipated developments, then the single-file upload token will be used, with Rucio providing the client with that token (but this is not yet certain). Please also bear in mind that Rucio supports other algorithms for deterministic RSE (e.g., how DIRAC uses Rucio), and also supports non-deterministic RSEs. Both of these would make it meaningful to have tokens with storage.create limited by a directory path.
Path vs strings. For the most part, this is an implementation detail. In fact, if you look into how dCache actually implements these, then it is doing a string comparison internally. It just does it very carefully. You can get (more or less) the same behaviour by the storage appending a / character to the string before doing the comparison. The important point is to a) identify the use-cases, b) describe the server's expected behaviour, given these use-cases.
Object stores: yes, these are interesting use-case. They are intended to store a collection of files, without any directory structures, which (to me) points to the first use-case above (upload a single file). Some object stores provide a kind of pseudo directory structure, but is this something WLCG needs? If Rucio is used with the storage configured to use a deterministic RSE (with something similar to the default algorithm) then directory structure really doesn't matter. Object stores are certainly a valid consideration; however, I don't understand what would be the use-case for a $PATH targeting anything other than the whole identity of the file to be uploaded. This should be explained somewhere.

In summary: yes, should update the document. No, a simple substring isn't a required behaviour (at least, I don't see any use-case requiring this). Compatibility is something we (try to) achieve by covering the use-cases in the JWT profile, ensuring there isn't ambiguity in the expected behaviour, and testing servers that claim to implement the profile.

In terms of way forward, I would suggest that we come up with a pull request that replaces the current text of storage.create, that is associated with (and closes) this issue.

Answer 5 · 2022-11-01T15:29:54.000Z

Just to summarize:

$PATH must be treated as a directory or file, partial matches for last component are not acceptable
implementation can't ignore trailing slash in the storage scopes

With storage.create:/foo/bar scope

allow to create directory /foo
deny to create file /foo
allow to create directory /foo/bar
allow to create file /foo/bar
deny to create files&directories like /foo/barbaz
allow to create files&directories like /foo/bar/baz

Same behavior for storage.create:/foo/bar/ except

deny to create file /foo/bar

Update discussed in WLCG DOMA BDT meeting

Could somebody with more experience in writing standards prepare an updated text for WLCG JWT profile? (even this contribution was not originally written by me clearly).

Answer 6 · 2024-01-06T23:37:51.000Z

PR #48 proposes changes to address this issue: what do people think?