HydraCG/Specifications

Indicate a partial collection view is ordered

pietercolpaert opened this issue · 21 comments

Opening a difficult discussion here, yet I hope given a limited scope, we could set a good standard here:

Currently there is no way to to indicate how a partial collection view is ordered. I therefore propose a way to add this as in the following example:

<collection1> a hydra:Collection ;
              hydra:view [
                           hydra:orderedBy ( [
                                 a hydra:OrderDescription ;
                                 sh:path dbpedia-owl:birthDate ;
                                 hydra:orderOption hydra:ascending ;
                                 hydra:castTo xsd:Year
                           ]) ;
                           hydra:first <?page=2014>;
                           hydra:last <?page=2020>;
                           hydra:next <?page=2019>
                      ] .

This allows to have:

  • Multiple predicates on which the collection is ordered through an rdf:List
  • Being able to order on a part of a literal via SPARQL literal casting: https://www.w3.org/TR/sparql11-query/#FunctionMapping
  • Being able to indicate ascending or descending order via hydra:orderOption
  • Being able to handle deeper ordering via sh:path (e.g., sh:path ( ex:parent dbpedia-owl:birthDate ))

Mind that the goal is not to specify the order of object within the JSON-LD documents. A client should always order the triples or object itself when the document has been retrieved. It only tells something about how the next page related to its previous page.

Related issue: #6 — However, I do not want to introduce a hypermedia control to order a partial collection view, I just want a way to indicate how a partial collection view is ordered.

Are there other proposals of how to do this? Are there proposals that can overcome this limitation?

Hi, what is the purpose of hydra:manages[] ?

Hi, what is the purpose of hydra:manages[] ?

It’s an unstable draft proposal for partially indicating the contents of a hydra:Collection. See https://github.com/HydraCG/Specifications/blob/master/drafts/collection-representation.md

How can we proceed to get this as a draft proposal? Can someone show me the way?

@pietercolpaert probably the most effective way is to participate to the Monday conf-call (there are reminders every week on Slack) and/or directly open a PR on the spec. Usually the initial thing is to gather consensus, so talking about your idea in the conf-call will help for sure.

Sorry for late reminder, but we've got this topic on our today's call agenda

@alien-mcl I see the issue is put on the agenda of the 11th of June again. Will try to join that confcall!

I’ve updated the issue’s example with sh:path and corrected the manages block

Didn't we agree in #150 that manages should be renamed to memberAssertion?

I’m just using the latest spec as published here: https://www.hydra-cg.com/spec/latest/core/#manages-block

Didn't we agree that manages should be renamed

Well, only kind of. Care to submit a PR?

@pietercolpaert Let's not have the "manages block" distract us here ;)

I think it's good in general although introducing SHACL may be controversial even if it's inevitable.

Unclear how to handle deeper ordering

This may be tricky but looks like SHACL has rich support for paths.

(e.g., ordering on the birthDate of a parent of a foaf:Person managed by the collection)

sh:path ( ex:parent dbpedia-owl:birthDate )

Also let me ask you some questions:

  1. why orderOption? In most I'm aware of this would be called order direction
  2. what is the significance/usefulness of castTo?

@pietercolpaert Let's not have the "manages block" distract us here ;)

Yes, totally agree! For the sake of it, I’ll just remove it from the example!

I think it's good in general although introducing SHACL may be controversial even if it's inevitable.

Unclear how to handle deeper ordering

This may be tricky but looks like SHACL has rich support for paths.

(e.g., ordering on the birthDate of a parent of a foaf:Person managed by the collection)

sh:path ( ex:parent dbpedia-owl:birthDate )

Yes! That’s why I started using shacl:path. I think it’s the perfect solution to describe a property path!

Also let me ask you some questions:

  1. why orderOption? In most I'm aware of this would be called order direction

I must have been thinking about other options that are not just ascending or descending. For example, it could be that all the next pages are “east of” the current page.

  1. what is the significance/usefulness of castTo?

To express that if you’d have a date literal, you order only on the year and you cannot count on an ordering based on the months. This would also allow you to order based on quarter for example.

@alien-mcl three points I should respond to:

East of as an orderOption: keep it simple and one-dimensional

Ok, I don’t have a strong opinion on that. I’m OK with orderDirection, but we need to elaborate on how to order specific literals as well (such as strings: we might want to add the specific locale for example).

castTo predicate

Imagine you have 5 pages with members with dates in 2 years (the first 2 in 2015, the next 3 in 2016). Within the first 2 pages, you do not have any ordering, but you do know that the first 2 pages are lower in year than the next 3 pages. Important for clients looking for all members in January 2015: they will need to download 2 pages, not 1.

For the castTo predicate, we could start from just the SPARQL specification on that, but we can leave it open for more options.

If the design of castTo is a blocker, we can leave it a suggestion for now.

sh:path

If it’s not part of the core spec/lib, we do need a fall-back to indicate what exactly the ordering means. Do you have a suggestion for that?

We had an interesting discussion on today's call regarding this issue. In general, there are two different approaches here:

  • client knows what he asks for and server obeys (this is the one I personally support)
  • server knows what he is sending and client may discover it (similar to your @pietercolpaert snippet at the beginning)

We agreed that we shall come with some cookbook examples and deliberate on pros and cons of each approach. Both approaches still will need that querying_for_specific_order mechanism, but this is somehow another part of the story.

@alien-mcl those are not exclusive approaches. We need both.

client knows what he asks for and server obeys (this is the one I personally support)

There is not way to enforce that the "server obeys". If the client requests to order by "first name" and "last name" the server has full authority to ignore either or both. The client would want to understand that they got not exactly what they asked for.

Same with a request without explicit order like plain GET /people. If the server applied an order the client will not know about it without response metadata.

One more example is implicit order which is more specific than what the client requested. Say the query was /events?orderBy=year. The server will do that but to keep a stable order it may implicitly add a second order by month name to keep a stable sorting for those items which have the same value for year. Otherwise those members would "jump around" their section of the data. Which can produce really weird results when a given number of elements is greater than the page size. Subsequent requests for the same page+order may even return completely different results.

So, the client would want to know that they requested /events?orderBy=year but effectively got /events?orderBy=year,month,name,id.


Too bad you could not join @pietercolpaert. Here are my additional thoughts, some of which have been discussed during the call, some of which have not:

  1. We discussed to mandate the use of rdf:List when the server decides that it's important for the client. This is the only way which removes the necessity for the client to deeply understand the proposed hydra:orderedBy description.
  2. Still, I'm convinced that we need this description but it can be simplified when the members are actually an ordered rdf List.

What I mean by the second point is that once the response triples are actually ordered, then the client should not need to know how to actually perform the ordering. Nor should they have to. This completely eliminates the necessity for castTo and

@tpluscode I think order within the page is not interesting. The overhead for a client to do the ordering is minimal. I mainly want to describe ordering between pages: I want a client to know that it will not find more items in the next pages, as it won’t match its filter any longer.

@alien-mcl For client-initiated ordering, please discuss in #6. This issue is particularly focused on helping clients to prune their search space.

I think order within the page is not interesting

You really think so? It's not about runtime overhead but implementation complexity. The spec and a generic client will have to be very sophisticated to support a universal order description. Start adding paths and castTo and it's almost certainly going to become a nightmare.
Not to mention that the particular field used to sort may not even be part of the response itself, which will prevent any kind of in-memory sorting. rdf:List is the easiest way to avoid all of those issues and simplify the order description.

I want a client to know that it will not find more items in the next pages, as it won’t match its filter any longer.

Maybe it's not the best approach to try to standardise? Given the year example, the first design choice would be to filter and not have the client figure out the contents of further pages by analysing how the page got sorted and what are the contents.

I think order within the page is not interesting

You really think so?

Yes, and looking at how the Linked Data Platform is doing it, I’m not the only one with that opinion. I quote from their spec:

There are many cases where an ordering of the members of a container is important. LDP does not provide any particular support for server ordering of members in containers, because any client can order the members in any way it chooses based on the value of any available property of the members. Read more

I want a client to know that it will not find more items in the next pages, as it won’t match its filter any longer.

Maybe it's not the best approach to try to standardise? Given the year example, the first design choice would be to filter and not have the client figure out the contents of further pages by analysing how the page got sorted and what are the contents.

My use case is for Web APIs where supporting any kind of dynamic client-side ordering/filtering on the server-side would become too expensive (cfr. the Linked Data Fragments axis where we want to find a trade-off between server and client effort). In order to optimize caching, I want to minimize the amount of orderings I support, and describe these in-band.

I think indeed that we could hold a long debate whether or not the design would be better or worse, but that is not an argument for now supporting this small addition to the spec in Hydra.

looking at how the Linked Data Platform is doing it

The LDP quote seems to suggest that the client should be responsible for sorting the container members in-memory? As in, fetch 1 million members and sort in the browser?

Maybe I'm misreading it. Is it placed out of context?

that is not an argument for now supporting this small addition to the spec in Hydra.

It is because Hydra should not just be a set of various (potentially impractical) descriptions but also the way for generic clients to consume them. It is a bad idea to standardise descriptions which cannot be easily coded against IMO. That's why I think we should first consider the desired result and also different scenarios. And only then figure out the standard solution.

Discussion moved to HydraCG/extensions#4