solid/specification

Should Solid (storage) servers support "RDF documents" containing multiple subjects (or quads)?

lecoqlibre opened this issue · 18 comments

Definitions

The following definitions are used in the message below.

  • RDF named graph: a RDF graph that is assigned to a URL in a Solid POD. Example: the file "example.ttl" containing a RDF graph in the LDP container "/example" is a RDF named graph.

  • RDF resource: Interchangeable with "RDF named graph".

  • RDF document: A RDF resource which can contain triples made of different subjects like:

<ex:subject1> <ex:p> <ex:o>.

<ex:subject2> <ex:p> <ex:o>.

<ex:subject3> <ex:p> <ex:o>.

Problem

I'm asking the question of the title because some current storage server implementations (1) can't manage RDF graphs containing more than one RDF subject! This could be an issue as people won't be able to express different perspectives in their POD (see this message from H. Story).

For instance, some servers like the previously mentioned ones are not able to treat the following example requests (I've only tried with SemApps so far and I'm pretty sure it's the same for DjangoLDP):

  1. POST:
curl -X POST -H "Content-Type: text/turtle" \
  -d "<ex:subject1> <ex:p> <ex:o>. <ex:subject2> <ex:p> <ex:o>." \
  http://localhost:3000/

Response: ERROR

  1. or PUT:
curl -X PUT -H "Content-Type: text/turtle" \
  -d "<ex:subject1> <ex:p> <ex:o>. <ex:subject2> <ex:p> <ex:o>." \
  http://localhost:3000/myfile.ttl

Response: ERROR

But they are able to treat the previous requests when only one RDF subject is passed like:

  1. POST:
curl -X POST -H "Content-Type: text/turtle" \
  -d "<ex:subject1> <ex:p> <ex:o>. <ex:subject1> <ex:p2> <ex:o>." \
  http://localhost:3000/

Response: OK

  1. or PUT:
curl -X PUT -H "Content-Type: text/turtle" \
  -d "<ex:subject1> <ex:p> <ex:o>. <ex:subject1> <ex:p2> <ex:o>." \
  http://localhost:3000/myfile.ttl

Response: OK

It is like these servers are designed with a 1-1 relationship between the URL of the RDF resource and the (only one) subject contained in the graph of the same resource.

Using the previous definitions we can say that these servers are supporting RDF named graphs but not RDF documents.

Questions

  1. So I'm wondering if these servers are correctly implementing the Solid specifications? Is this requirement, of supporting multiple subjects when passing RDF graphs, clearly expressed somewhere in the specs?

Because this is breaking interoperability, I think. Indeed, some apps could want to write RDF graphs talking about different subjects, right?

  1. Moreover, is supporting "RDF documents" is a clear requirement of Solid?

Because even if these servers would accept RDF graphs with multiple subjects they could just decide to create one RDF resource per subject. Doing so will also be a problem because the different RDF resources won't be linked to each other (it still miss the notion of "document"). While it is hard to conceive (what location header to return when POSTing, the unecessary resource URL when PUTing - which is btw already unnecessary in their case), I guess it could happen if this is not cleary specified.


(1) like SemApps, DjangoLDP, maybe TrinPod?

So I'm wondering if these servers are correctly implementing the Solid specifications?

It is within serves right to reject. Even if it may seem strange in some cases. Although, the servers should be technically capable of successfully accepting them too. A server may process the payload and use their own constraints. Some of those constraints may be storage-wide and advertised in which a client can discover and adapt. The URI owner has the responsibility to manage the representations of a resource based on what they allocated the URI for. This is typically delegated to the server. What a resource describes should persist - this is a social expectation.

It is not breaking interoperability. This level of spec variability is expected, and all things considered, it is an important quality of an evolveable system. Supporting RDF documents is already part of Solid Protocol's Conformance. That is not at odds with what requests semantics with certain content could be applied to a particular resource.

Hum, WAC has very frequently more than one subject in the turtle content.

@csarven I'm not sure to understand your answer. It's like it's OK for servers to reject graphs with multiple subjects but in the same time they should be "technically capable of successfully accepting them too"?

With SemApps you can never ever store a RDF graph with multiple subjects in one resource. Any RDF resource stored in SemApps has an URL pointing to a graph with only one subject, being the URL of the resource (the resource URL = the subject of the graph). It's a deliberated choice enforced with a technical constraint. I'm questioning this choice: can Solid server implementers choose not to support quads?

Some of those constraints may be storage-wide and advertised in which a client can discover and adapt.

How to advertise it to clients?

It is not breaking interoperability.

For instance, in SemApps, we can't store and retrieve a TypeIndex properly like detailed in solid/type-indexes#29 (comment).

Is it OK? Do every client-client standard will have to define alternatives for storage servers who don't support quads? For example, provide an alternative to TypeIndex when the server can't support them?

Supporting RDF documents is already part of Solid Protocol's Conformance.

I suppose you are referring to RFC 7230 and RFC 7231 listed in the Conformance section.

SemApps supports RDF documents but with only one subject in the graph. Is it forbidden? Should not the specs say that storage servers MUST support quads? In other words using one URL to store graphs with multiple subject should be possible?


Hum, WAC has very frequently more than one subject in the turtle content.

@bourgeoa SemApps uses the Jena Fuseki TripleStore and provide a custom made version which support ACL, see here.

As far as I can see, this would break any self describing document. For example

<>  a foaf:PersonalProfileDocument; foaf:maker :me; foaf:primaryTopic :me.
:me ...

It is within serves right to reject. ... It is not breaking interoperability.

AFAIK, client-to-client specs all call for multi-subject graphs - WebID-Profile, Type Index, SAI ... So I really don't understand how this would not break interoperability.

A related issue - is there any talk of a way for a server to signal which client-to-client spec it supports?

The name of this issue seems to blur concerns, and the initial comment doesn't clarify matters well for me.

Multiple subjects commonly occur in collections of Subject→Predicate→Object triples; they do not require Subject→Predicate→Object→Graph quads.

The test POST and PUT calls in the initial comment that result in errors suggest to me that the target server may indeed have some bug(s), but they are not about Solid support. These POST and PUT calls only involve triples, as they are in text/turtle; there are no named graphs nor any other quads (which are not supported by Turtle).

I find Ted's assessment to be correct and is also in line with my earlier response.

I'll try to clarify my comments with respect to why servers may accept or reject a request for various reasons.

In order for servers to conform to the Solid Protocol with respect to RDF documents, when they accept an RDF document, they need to be able to provide representations in Turtle or JSON-LD when requested by a client. That is one of the ground requirements.

From there, there are two (possibly more) categories of constraints:

One category is imposed by the server involving behaviours and varying integrity checks for certain kinds of resources, such as containers, the root container (storage), auxiliary resources (in particular the ACL resource), notifications, but generally it could be as wide as any resource type, e.g., discussed in #191 .

Another category (or two) is implementation specific or ultimately driven by URI owner's decision. So then a server can accept or reject a request depending on the resource semantics, and ensuring its integrity. That can be accomplished by processing the payload. Servers can also choose to advertise these constraint to clients ( https://solidproject.org/ED/protocol#constraints-problem-details ). Similarly, for example, advertising the Shape of a resource that's used towards validation or possibly even the ODRL Policies that may be applied.

If a server only accepts an RDF document with only one subject (across the whole storage, for any resource), it would be due their own constraint (read: the URI owner's). In such cases, we can see that those servers would be limited to what they can store and share. A server could minimally or in a limited way conform to the Solid Protocol, but may be limited in some ways or non-conforming with respect to other specifications due to data models they use.


SemApps supports RDF documents but with only one subject in the graph. Is it forbidden?

No. At least not strictly for RDF documents that the client wants to create. As mentioned above, it would mean some limitations of the server. Let me put it another way. If the URI owner wants to a resource (not conflicting with other resource semantics) should only have one subject, that'd be totally valid, and it is within server's right to reject any request that might conflict with that view in order to uphold what the URI owner wanted. So, a server could in fact allow an ACL resource to have only one Authorization, but you get the idea as to what it can't enable either.

Should not the specs say that storage servers MUST support quads? In other words using one URL to store graphs with multiple subject should be possible?

I may not be interpreting your comments around one subject and Quads correctly. And, it would be easier to stick to definitions from specifications (instead of new definitions like the one in your first comment). Can you paraphrase?

Solid Protocol doesn't need to require all servers to support Quads or even RDF documents with one subject. A server may support TriG or N-Quads but not all servers need to. A server may reject requests with multiple subjects but not all.. (for obvious reasons like conforming to data models that are required from elsewhere).

Is there any talk of a way for a server to signal which client-to-client spec it supports?

I think that can kind of information could be available in the Storage Description Resource ( generally fits under the communication options use case: #355 (comment) ). We don't yet have a particular way to signal that but surely different ways to go at it. I think servers should generally publish their DOAP (implements), Service Descriptions if any, ODRL Policies..

These POST and PUT calls only involve triples, as they are in text/turtle; there are no named graphs nor any other quads (which are not supported by Turtle).

I would phrase that as "quads are not made explicit in Turtle statements". POST and PUT calls either contain or let the server create a target resource URI and the URI functions in every way as a named graph. Rdflib and other libraries are quite capable of querying quads on a group of Turtle documents. So I think it is too strong to say that quads are not supported by Turtle.

It seems to me that Turtle has "quads without names". See RDF-Concepts which says "Although a quad without a graph name consists of the same three components as a triple, it is a distinct concept, as it specifically captures the notion of a triple within the default graph of an RDF dataset.":

@lecoqlibre I think by the letter of the spec, your server is allowed to reject anything it wants to reject, but in practice Solid apps will not function correctly when you reject all documents that have multiple subjects inside it. Does that answer your question, can this be closed?

Please let me know if you want this issue to be put on the agenda for the 17 January CG Call.

Thank you very much @csarven for your detailed response and the pointers, it's much more clearer now!

I guess this question has been discussed and agreed in the group before?

On this topic, I'm just afraid that final users might get confused. They would sign for an account from a POD provider but when they would want to use an app and store data on their POD, there will be an error. So they might end up and just think "Solid is not working". This could be an issue in term of adoption we should consider?

I suppose it's the responsibility of the POD providers to clearly advertise the limitations of their implementation. Here I'm afraid once again that people will get confused. But I guess it's like any other things in life: we should be aware of possible limitations and always check the compatibility of the products we choose even if it's time and energy consuming. Maybe in the future there will exist some signs of quality assurance to help to see what a Solid server is supporting. Is that something you would expect?


Solid Protocol doesn't need to require all servers to support Quads or even RDF documents with one subject.

Although for the case where servers support RDF documents, I would expect the documents can at least contain multiple subjects! Is it really too strict?

These POST and PUT calls only involve triples, as they are in text/turtle; there are no named graphs nor any other quads (which are not supported by Turtle).

@TallTed Indeed, what I try to say is that the turtle triples make the graph and the URL of the resource is like the name of the graph. From the POD perspective we could say that the resource looks like a named graph.

I think by the letter of the spec, your server is allowed to reject anything it wants to reject, but in practice Solid apps will not function correctly when you reject all documents that have multiple subjects inside it.

@michielbdejong Right. I'm wondering if the spec should not ensure that the Solid ecosystem remains working when exchanging RDF data (containing any number of subjects).

[@jeff-zucker] It seems to me that Turtle has "quads without names".

According to the Turtle spec, it has TRIPLES, end of sentence. "Quads without names" implies that you could include those names in the serialization, which you cannot in Turtle nor N-Triples; this requires N-Quads.

Treating a Turtle file as a named graph is fine, but such a named graph does not put a name on each triple therein in Turtle serialization; again, this requires N-Quads or TriG.

Note that your quote from RDF Concepts says "Although a quad without a graph name consists of the same three components as a triple, it is a distinct concept", which can be lightly rephrased to "a quad without a graph name ... is a distinct concept [from a triple]".

@lecoqlibre — I think that any server/system which claims to support Turtle, RDF, or Solid, MUST accept uploads/inserts of documents/data which include statements with different subject values. There is NOTHING in any of these systems/specifications which mandates a single subject value across any given datafile, named graph, or otherwise.

I might suggest you raise your questions about the failing POSTs to the relevant server vendors/providers, as they are the only people who can answer why they aren't accepting your attempted POSTs. (I would also suggest including the full text of any ERROR, as that single word provides no hints about WHY the error was raised, which might share nothing but coincidence with the differing subject values. For all we know, you just happened to hit server overloads with those POSTs, while the POSTs that succeeded were attempted when the server was not overloaded.)

There is NOTHING in any of these systems/specifications which mandates a single subject value across any given datafile, named graph, or otherwise.

@TallTed It seems there is also nothing which forbids to have a single subject... so servers can do it, right? That is the question I'm trying to ask: should not it be written in the Solid spec that a server which claims to support Solid and RDF "MUST accept uploads/inserts of documents/data which include statements with different subject values"?

If we want to let the possibility for a server to refuse documents with multiple subjects, maybe we should have some kind of distinction between generic POD server and specific POD server? So a user would know what to expect when choosing a POD provider?

I don't think that the question of "how many subjects will a server accept?" is really relevant, nor that it is the core of the problem @lecoqlibre encountered with SemApps or DjangoLDP. I think a better characterization of the problem would be "Should Solid (storage) server accept to store any triple in any location?".

I don't think that the current spec allows to answer with a definite "yes" to that question, but I guess that many app developers assume that the answer is "yes". I don't have a final opinion on what the right answer should be, but I think that this kind of assumption needs to be identified, discussed, and either endorsed (by including them in the spec) or debunked (with explanations on how to work without them).

Should Solid (storage) server accept to store any triple in any location?

That seems like a more clear question. I'm not sure if that entirely hits the spot but I'll give it a shot. Breaking down assumptions and spells =)

As mentioned earlier, aside from particular request semantics applied to specific resources in the Solid Protocol (including referenced dependencies) there are no limitations preventing the "storage of any triple in any location." Servers should possess the capability to adhere to the requirements. One scenario has to do with providing certain representations with any triple, and another is whether any server will accept requests with any triple. The former is required if not severely limits the server. Anything beyond that can vary from one server to another, and the reason for it ultimately boils down to a server (URI owner) imposing resource semantics. It has a "shape" and whether that's software's internal logic or something public and concrete. Though, there is the option to communicate the constraints to the client.

The server has no additional or overarching constraints for processing HTTP messages. Again, beyond particular request semantics applied to specific resources. More on processing at #394 .

So, in order for some servers to reject requests when a payload contains 'any triple', they would need to have the ability to process the message for 'any resource' to some degree. These servers should "do the nice thing" (tm) by advertising resource-specific constraints (see Constraints and Problem Details and reference source / understanding behind it #185 (comment) ). Surely servers have their reasons to do all this (and that's a quality on its own) but this path is not necessarily simpler to implement than not processing the payload in the first place. It has more to do with the architecture behind the interface and whether certain resources need to be single-subject (or whatever). Again, ultimately comes down to whatever the URI owner wishes.

The app developer should assume that it is possible to store any triple anywhere, but they should also handle rejections or proactively adapt their future requests based on the constraints that the server advertises.

I believe the specification already incorporates these considerations and ways of working with it or around it.

Closing this as per agreement in CG meeting. We can reopen when needed and/or when more information is made available.