solid/authorization-panel

Scalability of writing ACP to every resource

Opened this issue · 8 comments

I'd like to understand the scalability of the decision to write access controls along with every resource. I'm in pretty deep waters here, my intuition may be completely off, but at the very least, we should possibly provide some implementation advice, so here goes:

If we write an access policy to all resources, I assume that we need to block all access until all these are written. If not, we might have a situation where a controller writes an access policy, only to see it take effect much later. So, we have to make sure this operation can be done quickly.

Intuitively, a URI space is represented by a trie, and searching a trie has an average time complexity of O(log n) and a worst case of O(n). Since we know nothing on how people will organize their URI space, I think we should assume the worst case.

My thinking is further that n is far from a constant, I certainly hope that Solid takes a lion's share of an exponential future, so n(t) ~ exp(t). If the trie assumption is correct, then the write speeds will therefore get exponentially slower in the worst case or linearly slower in the average case when we get bigger data. In addition, for every resource, there's an insert into a set, but I suppose that's a constant time operation these days.

I suppose that volatile memory is getting exponentially faster, and SSDs seems to be going that way too, but HDDs aren't, they have had a linear performance development. And then, there's the price/performance. I haven't been following that too closely.

It had me worried to read that every change of permissions would have to be propagated this way, could anyone please comfort me on this one?

Can you give an example of the policy you are thinking of and what your interpretation of its working is meant to be, and which documents you are basing that on?
(I ask as ot is not clear to me what the current state of ACP is, as it is being developed on the side without changes being discussed or communicated to us here. We are told there will be a version out to look at in 4-6 weeks)

Thanks. So I think that does not apply to the minimal extension to WAC described here (part of PR201) which allows the sharing of "policies" or partial access control rules, by editing the ACR to point to an axisting "Rule".
The problem you see is related to the advanced policy management which requires the server to manage the "propagation of policies", which I guess means adding a link to those policies in each affected ACR.

Yes, AFAICS, WAC references one and only one relevant ACL resource, which may sit in a hierarchy. Its algorithm for locating the resource is more complex than ACPs, ACPs algo for finding the ACR is very simple since it follows every resource, but ACR algo for propagation is then more complex. Not complex as in hard to do, complex as in that I fear its runtime for Big Data[tm].

I find acl:default to be quite problematic, especially if one takes clients into account, which one has to in a protocol issue 259. The idea of each resource having its own ACR is I think very good.

The propagation problem could be solved by using acl:include and just adding

<> :include <../.acl>

to each newly created ACR. Any client that wanted to break the inheritance link could remove that, and add extra Rules individually.

In order to enforce in a coherent way that the owner of the POD always has access all that is needed is for each resource to contain the header Link: </.acl>; rel="acl".

I must admit that I do not understand how your comment is relevant to the scalability problem I try to address here, @bblfish . It is not that it can't be done with the tools that we have, it is even pretty easy. We're probably talking past each other.

The problem is what the time complexity of the operation is; even if it is easy to do programmatically and the semantics is straightforward, it can have high time complexity. Now, I fully admit that I'm not particularly knowledgeable about the available algorithms to use to implement this stuff, it may well be that there exists some implementation that solves this problem, but I feel that the intuition that a URI space is a trie, and that a trie is worst case O(n) is fairly well established. If that intuition is correct, then the propagation mechanism won't scale for non-trivial datasets. 

The relevant text you are referring to is this one I think:

When a policy is added to an ACR for a container, it is possible to add it so that it applies to the container (apply predicate) or that it applies to all children of the container (applyMembers predicate).
The server will manage the propagation of the policies to children as they are created.

I think we agree that this entails that a change to a policy higher up the tree (closer to the root) would require the server to go through all child, grandchildren, ... ACRs, check them and add links to the new policy or even delete some links to no longer existing ones.

In the Issue 210: add :include relation proposal I just opened, a new resource would just need to add an :include link in the new ACR to the parent ACR (which could later be removed if needed). The inclusions would be declarative, so no data would need to be moved. As a result if say the root ACR added a new Policy, nothing would need to be propagated. That is the power of monotonic reasoning.
True Resource Guards and clients wishing to find out if they can have access, would need to follow the include relations, but presumably in your scenario there are a lot less requests than there are resources.

So with :include there would be no need to change anything else after an edit of a policy. It would also help overcome problems with acl:default behavior.

Right, that's a good point!

Yeah, you'd have to examine the entire branch up to the root to determine access right, but that is likely to be more manageable, especially with caching strategies (which would also be easier).

I tend to think that WAC's easily identifiable single access control is a reasonable starting point (on the balance of Einstein's "Everything should be as simple as possible, but not simpler"), and that inheritance should be added later once we know the scalability properties, but I can see how your proposal makes this more feasible.