w3c/wpub

Is PublicationLink a misnomer?

iherman opened this issue ยท 31 comments

Using the right terminology is important, because it also suggests some sort of a "mental model" of what we are doing. In this sense, I would suggest that the term "PublicationLink" is not the right term, mainly with the additional features proposed (for good reasons!) for audio or video books. I would suggest we would use the term PublicationResource (we can bike shed on a better term) and some of the constituent terms may also have to be changed.

If we agree, we should also close #235 ("Using LinkRole instead of PublicationLink") as being overtaken by events.

@BigBlueHat @HadrienGardeur @laudrain @TzviyaSiegman @wareid @llemeurfr

We started using PublicationLink as a way of characterizing a link, essentially following the pattern set by the HTML's <link> element. Hence the usage of the rel, description (standing in for alt) and encodingFormat.

However, the latest proposals around audio and video books brought to the fore that, in fact, we are not characterizing a link; we are in fact characterizing the resource at the "end" of the link. Hence my feeling that referring to a link is a misnomer. Consider the term description (that describes the resource, e.g., and image, rather then the link), or the encodingFormat that clearly characterizes the format of the resource not the link. And we may get additional terms like duration or sync-media that certainly does not feel right (at least for me) when related to a "link". However, if consider the structure as characterizing the resource, I would suggest that things fall into place.

There are two properties that may have to change if we look at that structure and consider it as a description of the resource:

  • We should use id instead of href: we are characterizing a resource identified by a URL, which is exactly the meaning of id
  • To use the right terminology, we may want to use role instead of rel, again for the same reasons.

Looking at one of the examples for media sync it becomes way cleaner. If we say:

   "readingOrder": [
        "...",
        {
            "type": "PublicationResource",
            "id": "chapter1.html",
            "encodingFormat": "text/html",
            "sync-media": {
                "type": "PublicationResource",
                "id": "sync-media/chapter1.json",
                "encodingFormat": "application/vnd.wp-sync-media+json",
                "duration": 123.45
            }
        },
        "..."
    ]

What this says is: "we have an html resource identified by 'chapter1.hml' which is overlayed by another resource, which happens to be of the 'vnd.wp-sync-media+json' format, has a duration of 123.45 as identified by 'sync-media/chapter1.json'." Just try to describe the same structure if talking about links...

@iherman your semantic web experience "talks" there. I agree that if RDF-speak we have here a resource with properties.
there +1 to your proposal.

Overall I agree but I have two comments regarding that proposal:

  1. We're using the same structure in links as well, where we can hardly say that the linked resources are part of the publication (they're simply referenced by the publication).
  2. Do we really need to repeat type everywhere? Isn't that something that we could handle directly in our JSON-LD context instead?

I am fine with changing to PublicationResource (instead of PublicationLink). If I am understanding this correctly, role values are only those that are defined in ARIA or DPUB-ARIA? This doesn't really provide any new information. Is this information valuable? Will it be more reliable than properties was in EPUB?

@TzviyaSiegman

If I am understanding this correctly, role values are only those that are defined in ARIA or DPUB-ARIA? This doesn't really provide any new information. Is this information valuable? Will it be more reliable than properties was in EPUB?

This shows that the term role may not be the right one... because this is not related to ARIA. This would simply a replacement for the current rel term. Any other term that would not be misleading?

@HadrienGardeur

  1. We're using the same structure in links as well, where we can hardly say that the linked resources are part of the publication (they're simply referenced by the publication).

If I had a freedom I would say the term should be Resource, but that is a term already used in RDF, so we should avoid possible clashes. Yes, with regards to links the term PublicationResource is indeed misleading but... I am not sure what else to use, to be honest.

  1. Do we really need to repeat type everywhere? Isn't that something that we could handle directly in our JSON-LD context instead?

That is a separate issue, regardless of the term we use. Let us move that into a separate issue if we want to address it.

We're using the same structure in links as well, where we can hardly say that the linked resources are part of the publication

Yes, this bother me, as well. It'll be confusing to say that the reading order and resource list are the definitive source of publication resources while we use something called PublicationResource to describe links. Can't we just call it LinkedResource and avoid the implications of whether it belongs to the publication or not?

I like LinkedResource.

@iherman

This shows that the term role may not be the right one... because this is not related to ARIA. This would simply a replacement for the current rel term. Any other term that would not be misleading?

Why not purpose ?

rel is the right term for what we're expressing: a relationship between the publication and a linked resource.

Also, we're not using href but url which is just fine and IMO shouldn't be replaced by id.

href and id means different things, afaik. id is the one that, formally, identifies the resource we are talking about and characterizing. Hence my preference to use id.

(In ugly RDF terms the current structure creates blank node for the whole structure, whereas using id means making statements directly on the resource.)

rel is the right term for what we're expressing: a relationship between the publication and a linked resource.

Is this what we are expressing? When I say rel="content" then we are saying that the resource is playing the role of a table of content...

I like purpose, by the way.

When I say rel="content" then we are saying that the resource is playing the role of a table of content.

That's how rel is generally understood, yes. How is that document related to this one. I wouldn't suggest we mint something new unless there's a good case why it doesn't apply.

I personally have always disliked "contents" as the identifier for the table of contents, though. I know it's the typical title for the table of contents within a work, and there's some unadopted precedent for it in HTML, but it's also ambiguous with the content or contents of the publication itself. (But I'm not expecting to win that argument here.)

When I say rel="content" then we are saying that the resource is playing the role of a table of content.

That's how rel is generally understood, yes. How is that document related to this one. I wouldn't suggest we mint something new unless there's a good case why it doesn't apply.

O.k. If everybody is fine with keeping rel, it is o.k. with me.

(The issue whether 'contents' is a good name for what we use it for is a separate issue, let us not discuss it here.)

Also, we're not using href but url which is just fine and IMO shouldn't be replaced by id.

To explain a bit what I said in #356 (comment) (sorry about href, indeed we use url): as I said, url and id are a very different thing. Sorry to be a bit too technical, but if I look at the details of the generated statements, then

"id":"http://mybook",
"resources" : {
   "type": "LinkedResource",
   "url": "http://ex.org",
   "duration":1234
}

Translates into:

<http://mybook> resources _:b .
_:b url <http://ex.org> .
_:b duration 1234 .
_:b type LinkedResource .

whereas if I replace url with id above, the set of statements are:

<http://mybook> resources <http://ex.org> .
<http://ex.org> duration 1234 .
<http://ex.org> type LinkedResource .

One could argue that the second set of statements more naturally translates our model into syntax. But it may be a difference in modeling style.

(Trying to get this issue solved) I think where we are at this point is:

  1. We have an agreement to replace the term PublicationLink with LinkedResource
  2. We seem to be fine continuing to use rel in spite of what I originally proposed (I am not 100% sure I like it, but let us not get into bike shedding on that)
  3. #235 should be closed without further action (overtaken by events)
  4. We have not agreement on whether we should use id instead of url

In fact... (I swear I do not do this to get a closure no matter what:-) we may want to keep both id and url, with the restriction that at least one of the two MUST be present. After all, the two may be different; the same way as we allow both in the manifest as well as we have both (the way schema.org defines it) for a Person or an Organization, we can have it here. The typical case may be

"resources": {
  "url" : "https://www.w3.org/TR/2018/WD-wpub-20180814/",
  "id" : "https://www.w3.org/TR/wpub/",
  ...
}

using id is probably cleaner (and that is a matter of a primer) but there may be cases (as above) when a url is better than the identifier.

@iherman I understand your comment from an RDF perspective, but IMO it doesn't necessarily matter that much in our case since we're mostly targeting schema.org processors rather than generic RDF ones.
Such processors would definitely know how to handle url.

One could also argue that while you always know the url of a resource, the same can't be said about an id. I'm fine having both at a publication level where the author of the publication is also the author of the manifest, but the same is not necessarily true for individual resources.

I am not sure what schema.org does, actually. I could imagine the do handle id and url differently, otherwise they would not necessarily exist side-by-side...

But, I believe, my comment in #356 (comment) holds. At the moment LinkedResource (to use the new name) is the only type we use that does not refer to id, only to url. I think we should allow for both, and let the author decide. This is in line with what schema.org seems to do.

@iherman from an author or a UA perspective, I really don't see any benefit to that approach, it just makes things more confusing.

The only benefit is the RDF output as I've stated in my previous comment.

This is already the case, isn't it? We have id and url for a person...

This is already the case, isn't it? We have id and url for a person...

Actually, any object in Schema.org has this dual feature. Us restricting may be seen as a mistake...

@iherman this is turning into a completely different discussion, can we close this one and open a separate issue?

I would be perfectly fine restricting our spec to:

  • id for Person/Company
  • url for LinkedResource

Given our extensibility model (JSON-LD + schema.org), we would not restrict the use of whatever's allowed in schema.org for such models, but we wouldn't give false expectations regarding the behavior of the UA either.

+1 to Hadrien, intellectual purity in such a case goes easily against interoperability: a UA will not process the id on linked resources (resp. an url of person/company), even if an author believes so because the spec allows for it.

I would definitely be opposed to make a restriction to Person/Company. There are cases when the id and a url is different and we should not mess with that.

I will not lie down the road on this, although I do feel uncomfortable with using exclusively url. But I guess we all have features that we are not really happy with; this is what is called consensus...

That issue being put aside, are we fine with #356 (comment)? I am happy to do a PR around it (unless @mattgarrish prefers to wait until we resolve and merge #359 to avoid merge hell).

unless @mattgarrish prefers to wait until we resolve and merge #359 to avoid merge hell

Yes, very much... :)

@iherman what's the use case for url on Person/Company in WP?

For id it can be used as an identifier to uniquely identify an author, but I'm not aware of any use case for providing links to an author or a publisher's website.

If you have time, you can dive into https://en.wikipedia.org/wiki/HTTPRange-14 :-)

I am not saying we should decide on this, certainly not. If we explicitly disallow the usage of url, for example, that is exactly what we would do.

Yes, this may be a purist's standpoint. But we should not put ourselves into the crosshair of this.

Once again, disallowing is not something that anyone has suggested but there's a difference between putting the spotlight on it in our spec and keeping things possible as part of our extensibility model.

@HadrienGardeur you are right this is a separate issue. If you want to raise it separately, go ahead; we are not discussing at this moment to go back to some of the definitions that have been around in the manifest for a while. (I am partially to blame for making reference to it.)