nodeSolidServer/node-solid-server

test RDF in a data island in the HTML

Opened this issue · 25 comments

@melvincarvalho
How can I help you ?

  • I can create a branch
  • I can review your code. The problem is I don't know how dataIsland is defined in HTML ? which tag ? I suppose we need to parse the html content
    • just as reminder a container/ with index.html automatically serves index.html
    • how do you expect to render RDF ? with an accept Header

An Html file with a dataIsland example may help me to understand.

@bourgeoa thanks for looking at this

A structured data island is simply:

<script type="application/ld+json" id="data">
{
  "json-ld": "goes here"
}
</script>

Then the content of the script would be whatever the JSON-LD is for that resource

So in the case where we have mashlib in a script an extra script tag is inserted with the data too

This would make all of the different mime types give back consistent RDF

Does that make sense?

It's probably worth noting that such data islands may be in additional formats (media types), either in parallel with or instead of JSON-LD. At OpenLink Software, we commonly inject data islands in both JSON-LD and Turtle. Other media types have also been used in experiments but are not commonly parsed, so are not commonly injected.

@TallTed, do you have an example we could use as a template for learning,

This article should be a help.

So my understanding is the following :
For a file example.html

  • data island is defined by these 2 elements :
    • a script tag. (Can there be multiple occurences ? consider only one for now)
    • a type the type can be any RDF contentType : turtle, jsonld, or XML
      nota : if multiple occurences of data island is allowed then a third parameter is needed.
      The id must be unique for the HTML document. It is not specific to script tag. may be 'data-*'
  • text/html shall be considered an RDF by the server
    What should be the result of :
    • 'GET' on an html document with an Accept header contentType
      • return a script tag content with a type=any RDF contentType, content being converted to contentType
    • 'POST', 'PUT', DELETE have an action on the HTML document, including the data island
    • 'PATCH' has an action on the data island.

The PR #1715 implements the following :

  • a script block <script type="RDF contentType" id="data">RDF content</script>.
  • id is not a MUST and not used by NSS
  • a data island can be discovered from anywhere in the HTML resource.
  • Both tags </script> and closing tag </script> are needed.
  • the created or modified data island script is always inserted just before the closing </head> tag

Data island is fetched with :

  • GET and returns an RDF resource depending on the RDF contentType Accept Header :
    • text/turtle, text/n3, application/ld+json, application/rdf+xml
    • no Accept Header or text/html return the usual HTML resource
  • PATCH creates or modifies the HTML resource :
    • by default a new data island is created with a text/turtle contentType.
    • an existing data island is modified using the existing data island type parameter.

Question :
Should PATCH allow to store the data island using Accept Header ?
is using the Accept Header SOLID compliant ?

This is fantastic!

Could the default be application/ld+json or configurable, say, in the NSS config? Reason being that parsing JSON is native to the browser and easy

Unsure about the PATCH operation, isnt that server wide?

I tried running the dataIsland branch locally and managed to log in. But I was unable to see a data island in the webid profile that was created. Will have a look to see if there's anything obvious that can be fixed

@melvincarvalho

Could the default be application/ld+json or configurable

  • The default is only used in PATCH, you can always use PUT to create an html resource with a JSONLD data island
  • Yes it is possible to default to JSONLD, but I was with the idea that Solid usually default to TTL
  • Make it configurable imply to pass a parameter, in HEADER I suppose. Nothing is available in the actual n3 patch (solid v0.9)

But I was unable to see a data island in the webid profile that was created

Well webid is not an html resource.
Creation of data island is made client side in html documents.

@bourgeoa ah, i see, thank you

re: patch, yes turtle I think is best default in that case

Would it be possible to generate data islands on the server side?

I'm not sure if there are many benefits to making changes on the client side

Would it be possible to generate data islands on the server side?
I'm not sure if there are many benefits to making changes on the client side

I'm not sure to understand what you are looking at.
Create an html document server side ? At pod creation ? On other situations ? When ? Why ?

Data island is just a way to store RDF data in an html document. Dokieli is an other way.
If you want to produce a data island with the html body, you need to create a specification. I haven't seen any.

Create an html document server side

No, just the same way it's done today

When we get an HTML file it contains mashlib, and that file is given to the browser by node solid server

What I'm saying is that, as well as adding mashlib, give back a data island in the RDF so that it's consistent with the other mime types

The way to test this, would be to run curl against the file, and see if the data island is there. This is something I'm trying to write a test for the test suite, to explain it better

So NSS when it has a GET request, and gives back HTML, also pulls in the JSON-LD and puts it in a script tag

So NSS when it has a GET request, and gives back HTML, also pulls in the JSON-LD and puts it in a script tag

Where is the JSON-LD located ? can you give an html content with JSONLD content ?
Is this what you are at https://www.w3.org/2012/sde/ ?

can you give an html content with JSONLD content ?

Yes you put the RDF in a SCRIPT tag inside the HTML

This is how most of the semantic web works today, outside of Solid. Having RDF in HTML would bring solid up to par with the majority of existing semantic web

Example

Alice has a <webid>

curl <webid>

Gives back:

  • html page
  • mashlib script tag
  • script tag with RDF in JSON-LD

The RDF for the webID is stored on the server, but returned by node solid server

So exactly as we have today, but now HTML files also have RDF, just like the other mime types

Is this what you are at https://www.w3.org/2012/sde/

Yes, this would be an excellent tool for testing

Side thought: It might be possible that the JSON-LD returned from NSS and the html returned form NSS could be almost identical

JSON-LD returned by NSS

{
  JSON-LD-HERE
}

HTML returned by NSS

<html>
...
...
<script>
{
  JSON-LD-HERE
}
</script>

... mashlib here

<body> here
</html>

This might be relatively easy to code if the same view is copied from JSON-LD to HTML, and some scaffolding added. If I get some cycles free, I might give this a try in a local branch

I think I have isolated the code that does this:

https://github.com/nodeSolidServer/node-solid-server/blob/main/lib/handlers/get.js#L84

I might be able to change the resource mapper a bit so that it brings back JSON-LD then put that into the HTML with the databrowser config setting

Mashlib is an app running in the browser that allow to browse pod/pods documents giving different representation depending on RDF data, content negotiation or actions ( create/edit ...)

An html document doc.html text/html was always returned has doc.html text/html containing all the original html content Including head/script/body with all scripts be it JavaScript or data island.

My PR add only content negotiation.
If the doc.html contains a data island script, then you can ask it with GET Accept header application/ld+json and receive a document doc.html application/ld+json. When there is no data island GET return 404.

https://github.com/nodeSolidServer/node-solid-server/blob/main/lib/handlers/get.js#L84

This line just tells if that URL can be displayed using mashlib app.
If that URL is an entry point for mashlib app.

A pod URL pointing to an html document is not displayed with mashlib but directly by the browser and contains all the html including the data island if any.

@bourgeoa the content type text/html should return RDF. Right now it doesnt

The way to fix this is to put the JSON-LD inside a script tag in the HTML as shown above

Doing it client side, does not fix the issue, it can be tested here

https://www.w3.org/2012/sde/

I believe it can be fixed here:

https://github.com/nodeSolidServer/node-solid-server/blob/main/lib/handlers/get.js#L84

By changing the content pulled in by the resource mapper. If I get time I'll have a go locally and a proof of concept

@melvincarvalho As you can see the data island is there for this URL https://bourgeoa.solidcommunity.net/public/alain.html

image

Exactly what mashlib give in the source-pane

image

@bourgeoa that looks beautiful!

I can confirm it works with curl:

curl https://bourgeoa.solidcommunity.net/public/alain.html

<html>
<script type="text/turtle" id="data">
<> a "test".
</script>
<body>test data island</body>

Fantastic!

A few things:

  • the closing </html> tag seems missing
  • body should be empty
  • I cant see the mashlib script (maybe Im wrong)
  • Is there some way that the type can be set (say in a config) to json-ld?

In short, everything should be exactly how is was before. With html / head / body / mashlib. The only difference is one extra script tag containing RDF. So the change to the page should be quite minor.

Another point, while RDF is being returned in this one file, RDF needs to be returned by NSS for every file

Example Resource

https://bourgeoa.solidcommunity.net/public/approxlocation.ttl

HTTP GET with Curl

curl -H "Accept: text/html" https://bourgeoa.solidcommunity.net/public/approxlocation.ttl

What is returned (no RDF)

<html><head><meta charset="utf-8"/><title>SolidOS Web App</title><script>document.addEventListener('DOMContentLoaded', function() {
        panes.runDataBrowser()
      })</script><script defer="defer" src="/mashlib.min.js"></script><link href="/mash.css" rel="stylesheet"></head><body id="PageBody"><header id="PageHeader"></header><div class="TabulatorOutline" id="DummyUUID" role="main"><table id="outline"></table><div id="GlobalDashboard"></div></div><footer id="PageFooter"></footer></body></html>

What SHOULD be returned (includes RDF)

<html><head><meta charset="utf-8"/><title>SolidOS Web App</title>

<!--- DATA ISLAND SHOULD GO IN HERE -->

<script>document.addEventListener('DOMContentLoaded', function() {
        panes.runDataBrowser()
      })</script><script defer="defer" src="/mashlib.min.js"></script><link href="/mash.css" rel="stylesheet"></head><body id="PageBody"><header id="PageHeader"></header><div class="TabulatorOutline" id="DummyUUID" role="main"><table id="outline"></table><div id="GlobalDashboard"></div></div><footer id="PageFooter"></footer></body></html>

It is a bad html. But it is example. I mistyped the closing html tag
There is no mashlib script as I explained URL pointing to an html is not using mashlib.

But https://bourgeoa.solidcommunity.net/public/ which is a container URL does use mashlib app. (Container have a turtle representation.

There is no mashlib script

Mashlib is needed. It it there today. It should not be removed.

Nothing should be removed, only a script tag added, which contains RDF

A good example to test would be: https://bourgeoa.solidcommunity.net/public/approxlocation.ttl

Another point, while RDF is being returned in this one file, RDF needs to be returned by NSS for every file

@timbl do you agree with that ? This seems a very interesting point.

@bourgeoa wrote:

a script tag. (Can there be multiple occurrences ? consider only one for now)

Yes, there may be multiple script tag occurrences. It's often best to have one occurrence per media type, but as long as each occurrence has a unique id (or no id), this limit need not be observed.