IIIF/trc

Approve Content State 1.0

Closed this issue · 15 comments

Links

Background and Summary

The Discovery TSG believe that the Content State API is now ready to be approved as version 1.0. The current version of the specification is 0.9 and we have previously taken this version to the TRC. Changes since then are editorial amendments and fixing an error in the encoding example.

We have seen the following implementations (this section tbc):

Getty

University of Durham

Biblissima

OCLC

  • ???

Digirati

  • Consumer and publisher as part of Muya (Multimedia Yasna project, in progress project for SOAS)
  • A related IIIF Discovery platform with cross collection/institution search;
  • Publisher as part of Madoc (Crowdsourcing environment)

Concerns

The model approach, and protocol approaches, seem to have wide support and already multiple implementations (beyond the above list). The editors and the Discovery TSG are concerned whether the encoding approach is the right one. It works, it is robust in the HTTP transmission scenarios in which it is in danger of being corrupted through unintentional re-encoding, it works in browsers... but could it be simpler and/or more concise? What better approach have we missed here?

Proposed Solution

The Discovery TSG would like the TRC to support the move to a full 1.0 release of the Content State API.

Encoding Implementation Concerns

I tried to implement the content encoding in Ruby, expecting it to be trivial (mostly so I could benchmark Base64 encoding), but it turns out there's no built in way to encodeURI in the same way ECMAScript does it without either rolling my own implementation - which seems pretty unlikely to be safe - or trying to get Addressable to work. URI.encode is gone, CGI.escape uses a different spec, URI.encode_www_form_component doesn't take arguments for which letters are safe, and you could probably use the Addressable gem but you'd have to also pass A-z and who knows what else.

CGI.escape:

"%7B%22id%22%3A%22https%3A%2F%2Fexample.org%2Fobject1%2Fcanvas7%23xywh%3D1000%2C2000%2C1000%2C2000%22%2C%22type%22%3A%22Canvas%22%2C%22partOf%22%3A%5B%7B%22id%22%3A%22https%3A%2F%2Fexample.org%2Fobject1%2Fmanifest%22%2C%22type%22%3A%22Manifest%22%7D%5D%7D"

URI.encode_www_form_component

"%7B%22id%22%3A%22https%3A%2F%2Fexample.org%2Fobject1%2Fcanvas7%23xywh%3D1000%2C2000%2C1000%2C2000%22%2C%22type%22%3A%22Canvas%22%2C%22partOf%22%3A%5B%7B%22id%22%3A%22https%3A%2F%2Fexample.org%2Fobject1%2Fmanifest%22%2C%22type%22%3A%22Manifest%22%7D%5D%7D"

The above two are the same. The desired output is:

"%7B%22id%22:%22https://example.org/object1/canvas7#xywh=1000,2000,1000,2000%22,%22type%22:%22Canvas%22,%22partOf%22:%5B%7B%22id%22:%22https://example.org/object1/manifest%22,%22type%22:%22Manifest%22%7D%5D%7D"

Seems like the sticking point is the colons and the slashes. Seeing how annoyingly difficult this is, what's the purpose of percent-escaping and then Base64 encoding? If I just Base64 URL encoded, I'd get

irb(main):052:0> Base64.urlsafe_encode64(content, padding: false)
=> "eyJpZCI6Imh0dHBzOi8vZXhhbXBsZS5vcmcvb2JqZWN0MS9jYW52YXM3I3h5d2g9MTAwMCwyMDAwLDEwMDAsMjAwMCIsInR5cGUiOiJDYW52YXMiLCJwYXJ0T2YiOlt7ImlkIjoiaHR0cHM6Ly9leGFtcGxlLm9yZy9vYmplY3QxL21hbmlmZXN0IiwidHlwZSI6Ik1hbmlmZXN0In1dfQ"

which easily converts back:

irb(main):053:0> Base64.urlsafe_decode64(Base64.urlsafe_encode64(content, padding: false))
=> "{\"id\":\"https://example.org/object1/canvas7#xywh=1000,2000,1000,2000\",\"type\":\"Canvas\",\"partOf\":[{\"id\":\"https://example.org/object1/manifest\",\"type\":\"Manifest\"}]}"

I see IIIF/discovery#90, but in ruby for me this is as simple as:

irb(main):007:0> Base64.urlsafe_decode64(Base64.urlsafe_encode64("https://en.wiktionary.org/wiki/Ῥόδος".e
ncode("UTF-8"), padding: false)).force_encoding("UTF-8")
=> "https://en.wiktionary.org/wiki/Ῥόδος"

That can't be done in JS? Edit: I've done some research - yikes, atob and btoa are rough. I got nothin - except maybe changing the implementation to instead provide a URI to the annotation that you dereference, and have viewers able to generate URIs which include the annotation however they want, but that's just avoiding the decisions via RPC. It's all a mess, I still think it's a bunch of work to support the "export a workspace" use case.

Benchmarking

I was somewhat concerned about forcing an encoding step in, but in Ruby it looks like I can build a hash, encode it as JSON, and Base64 url encode about 181/millisecond on my PC. Given a standard result set of 100 records on a search result page this seems negligible. This goes down to about 125/millisecond with CGI escaping, but still - meh.

Extra Notes

The only thing stopping us from just having a query parameter which is "thing I focus on", with some way of saying what time to skip to maybe, is the use case for "I want a URL I can send to someone and it'll load 6 manifests" yes? I wonder if there's still space for an implementation where we have that simple target query parameter for the 90% case of "I click a link and it jumps to page 6" and leave the encoding bit for the more difficult use case. Otherwise the implementation above feels really heavy, personally.

If that's not possible, I wonder if we can encode the parameters in the URL like we would a form - rather than as JSON? Something like ?targets[0][manifest]=https://example.org/object1/manifest&targets[0][id]=https://example.org/object1/canvas7#xywh=1000,2000,1000,2000. We might hit a URL size limit, I suppose - maybe not for your standard use cases though, and if it got too big you could POST, theoretically.

All that being said, this is great work! Thanks for all the time put in.

Huge thanks @tpendragon, this is exactly the kind of language/framework implementation attempt we were looking for.

That can't be done in JS? Edit: I've done some research - yikes, atob and btoa are rough.

Yeah, the primary concern was that the encoding/decoding should be simple in JavaScript in the browser, and make sure that it's still widely implementable in other frameworks and languages.

Additional:

From IIIF/discovery#90 (comment):

The mechanism in that mozilla page is not the only way of making the JavaScript string safe for btoa, but clearly a server-side implementation has to do the same thing. We chose %encoding as an easy thing for browsers to do.

It's also somewhat arbitrary in the Python, too:

quoted = urllib.parse.quote(plain_text, safe=',/?:@&=+$#')

...but this only needs one recipe to exist per language in the cookbook.

You can see the b64-only breaking at https://base64url.herokuapp.com/ where the client's UTF-16 and the server's UTF-8 meet

I keep feeling that a robust and simple answer is here somewhere but I can't quite reach it! Another approach is to accept the fact that we need to do something else besides immediate atob on the client, but that would mean a fairly nasty looking algorithm as part of the spec, e.g., https://developer.mozilla.org/en-US/docs/Glossary/Base64#solution_2_%E2%80%93_rewriting_atob_and_btoa_using_typedarrays_and_utf-8

I got nothin - except maybe changing the implementation to instead provide a URI to the annotation that you dereference

The spec already supports this, and the 90 (99?) percent use case is just a resource URL. If the content state is just a resource URL, which might be a manifest, a collection, etc., or a content state annotation, then it does not need to be encoded:

?iiif-content=https://example.org/manifest1
?iiif-content=https://snippets.org/my-complex-workspace.json

These are allowed as-is in the spec because they are just resource URLs.

The jump beyond this is where you want to convey something more than a single resource, but don't want to have to host the JSON for a content state describing that something. This is important - e.g., search results. They only need to exist as part of hrefs rendered on a web page. As long as the content state anno is safe in transit, you can convey anything that way, and clients that understand the Presentation API can navigate it and get you to the right point.

That said (and it's late in the day to think about this) there is a middle ground of common scenarios that are not single resource URIs, but are not arbitrary content states either. This is the large family of information comprising the trio:

  • The URI of the resource you want the client to focus on - typically a Canvas, but could be a Range
  • The dereferenceable URI of the resource that contains this resource - typically a Manifest
  • (optional) A t= or xywh= selector within the focused resource (when it's a Canvas)

...but is there a simple way of squashing these into a single parameter that wouldn't have exactly the same encoding problems as an anno? The anno just separates these out with {..}, ", partOf etc.

The single parameter bit is important too, for client simplicity. If there are multiple parts (e.g., the above might be iiif-content, iiif-content-source, iiif-content-selector) then the client has to manage multiple params in addition to its own params.

Anyway - many thanks again and good discussion points for TRC. I would dearly love to see a simpler encoding proposal.

@tpendragon Disclaimer - I'm not a Ruby developer!

However, I was able to encode the URI:

content = '{"id":"https://example.org/object1/canvas7#xywh=1000,2000,1000,2000","type":"Canvas","partOf":[{"id":"https://example.org/object1/manifest","type":"Manifest"}]}';

p = URI::Parser.new();

print encoded =  p.escape(content)

which returned:

"%7B%22id%22:%22https://example.org/object1/canvas7%23xywh=1000,2000,1000,2000%22,%22type%22:%22Canvas%22,%22partOf%22:[%7B%22id%22:%22https://example.org/object1/manifest%22,%22type%22:%22Manifest%22%7D]%7D"

This appears to be much closer. I think you can pass in a regex with the characters to escape, so could be a route. From a functional point of view, the above can be decoded and parsed in JS (decodeURI then JSON.parse).

@stephenwf Yup, that works closer. It's funny, I've never seen that one - I'm surprised it survived when URI.escape is gone.

It does strike me that I guess it's not terribly important if all the languages agree on how to percent encode as long as they agree on how to percent-decode, since the only purpose is escaping out of UTF-8.

If there can be a recipe for encoding and decoding for popular languages/frameworks, and people can easily find these recipes and just use them, then I think that's OK if the approach in the spec is the best we can do.

What we'd really like to avoid is the scenario where, one week after 1.0 is made official, someone says "why didn't you just encode/decode using <...> which is much easier and simpler and 100% reliable?"

Then we all end up looking like the polar bear in tpendragon's avatar.

What is that mystery <...> that we are missing? If there's a chance it exists, now's the time to find it!

Why decodeURI vs decodeURIComponent? I ask because I can use decodeURIComponent on the output of Ruby's CGI.escape and that makes me happy.

Add straight brackets to the motivation value in 5.3

As mentioned on the call would it make it simpler if we said you only need to encode the annotation if you are using the GET parameter option?

https://iiif.io/api/content-state/0.9/#initialization-mechanisms-link

I think all of the other use cases (drag & drop, copy & paste and files) could use straight UTF-8 JSON.

The post example could either use form-urlencoded if you needed to support a HTML form but I think its more likely that JavaScript would be involved and in that case you could send a content-type of application/json and post the JSON.

@tpendragon

We might hit a URL size limit, I suppose - maybe not for your standard use cases though, and if it got too big you could POST, theoretically.

That's one concern I have about encoding a complex state in a URI, in any serialization. How does one know when the URL size limit is hit, for example with a complex search result (in which case, as @tomcrane notes, one can't even switch to a link to a separate document with the state data)?

I am not sure how POST would work—the URI should carry all the info in order for this to work, correct?

Never mind—after a deeper read of the spec, I realize that POST is allowed (although I understand that that implies the lack of portability of the URI, if a client is willing to lose that) and that the URI length issue is limited since an arbitrary number of search results would present one content state URI per result.

Again - could we use encodeURIComponent as the spec here? From https://tc39.es/ecma262/#sec-encodeuricomponent-uricomponent ?

I believe this is the intention, as in the spec:

The encodeURI and decodeURI functions are intended to work with complete URIs; they assume that any reserved code units in the URI are intended to have special meaning and so are not encoded. The encodeURIComponent and decodeURIComponent functions are intended to work with the individual component parts of a URI; they assume that any reserved code units represent text and so must be encoded so that they are not interpreted as reserved code units when the component is part of a complete URI.

We're definitely not encoding the full URI here, just one query parameter.

My vote is 😕 until that's the case.

Hi all - I haven't had time to address these suggestions yet - I'll update https://base64url.herokuapp.com/ to use the xxxURIComponent functions.

Hi,

Thank you all for your comments both here and in the TRC meeting. We went through the comments today in the Discovery TSG call and felt that the issues raised by the TRC are significant enough the group would like to investigate them further and make changes before releasing 1.0 of content state.

With this in mind we are going to withdraw this issue from the TRC vote and once the issues raised here have been addressed we would like to bring content state 1.0 back to the TRC for approval. If you would like to be involved in the discussions please feel free to join us on a Discovery TSG call.

Thanks again for all of your comments and thoughts which will hopefully lead to an easier content state 1.0 standard to implement.

Glen Robson
IIIF Technical Coordinator