solid/specification

Implement/ensure Portability of user data

Opened this issue · 24 comments

This issue is to discuss possible ways to enable portability of user data.

(Originally opened by @nicola)
(Moved from solid/solid-spec#72)

Portability specifically means that "the user can take their data elsewhere", whenever they wish. It is a combination of the following features:

Perhaps the phrasing should be about interoperability? Two apps that use different RDF vocabularies are not interoperable, by default. With solid, there are other things that break interop, too, like container layout and data shapes and permissions and inbox use.

Interoperability is for a different issue. (In general, interop is a much broader issue, and is an overall aim of our project.)

Portability specifically means that "the user can take their data elsewhere", whenever they wish. I updated the issue description to clarify.

Ah, portability between pods instead of portability between applications. Sure.

I'd suggest phrasing the issue in terms of user experience, not technologies. I think it will emerge from that framing that the heart of this issue is actually the redirection mechanism (of which 301 is only one option). Saying that's optional is a bit like saying freedom of the press is optional. It only a problem when you need it.

Saying it's optional recognizes the fact that maintaining 301 redirects by the old server, when the user is no longer a paying customer and has moved on, is a policy decision on the part of the service provider.
Maintaining those redirects indefinitely for free is a non-zero (albeit small) cost to the provider.

Hence a strong recommendation, and not a spec-level requirement.

As for phrasing -- the specific phrasing here is in terms of server capabilities. As in, this is a placeholder/reminder for us to implement those features.

I dont think this is a feature that should be advertised too heavily, as it creates a false expectation. Moving links on the web is always going to be hard. We can help a bit, but never fully solve the problem, nor should we try to. If you want to guarantee portability, dont use the web. Use another scheme. The web's strength isnt portability, it's stability, which leads to a giant network effect. You cant have both. Users should instead be encouraged to choose URIs well with long term expectations. I've seen too many projects get hung up on this issue, worry about redirects, invent new URI schemes and end up losing their value proposition, for marginal or negligible gain. Let's do what we can but bear in mind the diminishing returns inevitable with this difficult issue.

If you want to guarantee portability, dont use the web.

From what I understand, one of the core goals of both Solid and the Crosscloud Project is to enable portability of user data on the web.
So, I will have to disagree here.

Use case: Alice uses databox.me as her pod provider for a couple years, at alice.databox.me. Then she decides she wants to switch to her own domain. Maybe databox.me gets sold to some company she doesn't like. Maybe they have a security breach that concerns her. How can she move without significant pain? The current designs for solid basically leave her stuck, as far as I can tell.

At absolute minimum, she has to be able to set up 301 redirects from everywhere in alice.databox.me. But that still allows databox's owners to see and influence lots about her visitors. On the current web, no one actually changes links because of a 301 so software might be relying on those redirects for many years, and every time giving the now-untrustworthy databox.me all their request headers, possibly including their webid. And what if it goes down?

For some years, I've been calling this problem "subdomain portability".

Are there other data portability issues?

@sandhawke

It's quite interesting that alice expects a company that she doesnt like, to give her a free service after she leaves.

As an aside: That company she doesnt like can also track her activity.

In general, links out are relatively portable, and links in (when not relative ie cross origin) are relatively hard to change. I think we could mathematically prove that the open world assumption makes this an intractable problem. It's important to note that you'll never solve ever use case, we can get some tho.

(I removed my previous comments, given that they make no sense now that the initial post has been filled up.)

  • 301 are the most flexible solution, as they can be set up for each resource
  • if all of Joey's data was on the subdomain joey.data.fm then data.fm could just point the DNS to the new server wholly owned by Joey, who could manage his own redirects
  • best of all is to buy your own domain name.

This does not look like the right way to do things but as a data point: WebDAV Redirect Reference Resources

It seems to me that if one is to edit the headers of a resource it may be better to have a Link header to a resource that contains those headers that it makes sense to edit, and allow those to be edited. Perhaps the meta rel?

Link: <doc,meta>; rel="meta"

Then with an ontology of headers a client could just PUT or PATCH that resource with the right vocab, so that it read:

<doc> redirect <http://other.server/doc> .

This clearly needs a lot more thinking about.

I think there's a privacy issue here. While the user experience may be apparently seamless in many cases, and transparent to the user. Should the user be informed that their previous provider will continue to be able to track aspects their social activity, even after they have moved (and in some cases indefinitely)?

Longer term the key is to have solid software implement permanent redirects properly, changing the referrer. And probably checking all links at least once every 30 days. That way, traffic to the old URLs should drop off quickly to just the URLs humans are still using. Maybe after 30 days or so html requests can switch to a page which explains the old URL is going away soon.

Thanks @bblfish it was looking pretty good until the XML :-)

I assume you mean rel=describedBy when you say rel=meta

I don't think rel=describedBy works for linking to a graph of link header triples. The describedBy graph is clearly under the control of applications. It's where you put ask the metadata about a jpeg, for instance, in LDP. The headers, on the other hand, are meant to be communication to/from the server.

I can't find any specs about the meaning of link headers on PUT or PATCH. How about we go with the semantics that PUT resets (removes) any links the client can control and PATCH leaves them intact; in both cases the provided ones are then added.

That doesn't work for my redirect suggestion though, since the PUT would be redirected, not clear the redirection header.

Hmm. Acl is a link rel the client is not allowed to affect. Type is a link rel the client is allowed to affect.

The use cases are pretty varied and I don't know some of them.

So, my revised proposal is links are added by using them on post/put/patch and removed by pointing then at some kind of magic flag value, or something like that.

Longer term the key is to have solid software implement permanent redirects properly, changing the referrer

So are you prosing that the protocol will potentially change my turtle files if I link somewhere? Will the linked to resource need to have an entry in the ACL to do this?

The specs for the permanent redirects, 301 and 308, say that when you get them, you're supposed to change the referring URL if you can. I'm suggesting:

  1. Whenever solid app code gets a 301 or 308 when following a link, it should consider whether it learned that link from a graph to which it has write access. If so it changes that URL in the graph.
  2. If it doesn't have write access, it uses a protocol for telling the server the link is stale. This is the only design part of this proposal. I guess resources could include a link to an endpoint to notify.... Or you could send a patch even though you don't have write access and the server could consider it, especially if you tag the patch as coming from a redirect.
  3. Every solid server tries to dereference all the URLs in its data, seeing if it gets a permanent redirect, at least once every thirty days. Maybe it even looks at non-RDF, like emails, where it shouldn't change the URL, and makes a note for the user somewhere.

I think this works. It relies on the idea that when you put something in RDF, you're actually stating what the RDF states, so you shouldn't mind if it gets rewritten into another form that means the same thing. This is why Skolemization and triple-reordering are okay.

(I.e. This is something we can do with RDF but not other data formats)

Whenever solid app code gets a 301 or 308 when following a link, it should consider whether it learned that link from a graph to which it has write access

Thanks for pointing this out. Good info.

If it doesn't have write access, it uses a protocol for telling the server the link is stale

Is it worth sketching this protocol out as a proposal?

Every solid server tries to dereference all the URLs in its data, seeing if it gets a permanent redirect, at least once every thirty days.

I see advantages, particularly in the general case of giving users more control over return codes. But Im cautious about adding this complexity at this point to the servers, which are supposed to be a dumb as possible (tho im not saying it wrong). Anecdotal evidence over the years have shown redirects to be a pain and add complexity (both to client and server).

Ultimately it's a problem that can never be fully solved, because a provider has no obligation to offer a service that costs them resource, if a customer has terminated their relationship with them. Neither do links in have to comply with automatically changing, some will not want to do that, and can never be forced to. Its an interesting problem to think about tho.

I'm leaning toward the notification protocol just being a PATCH. I'd like to add a flag suggesting it's because of redirection, but I can't think of a good way to do that. Ideas? A weird hack might be you set the webid to https://www.w3.org/2016/anonymous-link-corrector .

I agree it's complexity, and I don't think we need it right away, but I think we should have it in mind as possibly necessary for the ecosystem.

True, the problem can't be 100% solved, but I think it can be sufficiently mitigated, just like when people move to another house. Once in a while they'll lose an important connection, but no one feels trapped, unable to relocate, because they might miss some mail.

This proposal doesn't rely on providers being nice. You keep paying your old provider during the redirection period, which is likely to be one month, although you watch your traffic dropping off and decide for yourself when it's low enough to stop paying.

In fact, I think the baseline RTOS should allow customers to take subdomains with them (paying a nominal service fee), so this stuff isn't needed just because you want to switch providers.

So this isn't about service providers, it's about branding and image. Maybe you got alice.databox.me and you don't want to be associated with databox.me any more, in the minds of your contacts. Or maybe you've been using cleverkid251.com and decided that name doesn't fit you any more. People need to be able to change names.

In the proposal I think there's one obvious method overlooked that is actually optimal.

As part of your storage you can either

  1. buy a domain name
  2. point an existing domain name to your storage

Many web hosters do this already.

The web has a well deployed redirection system, namely, DNS. History has shown us that almost all projects that try and replicate this in some way, have failed to get traction. Let's not be one of them!

If someone cares alot about moving this is the tried and tested way to do it. As soon as an HTTP URI is shared in the global namespace is starts to accrue reputation and value. In an open world assumption, moving it will destroy some of that reputation and value. We should not be trying to tell the user otherwise, as I feel that is misleading.

I wouldn't want to use a system where I was stuck with one public persistent identification string for the rest of my life, or longer.

Rebranding will never be trivial, as you point out, but it needs to be seen as possible, and it would be great if we can make our part of it relatively painless.