json-ld/yaml-ld

Multiple documents in YAML

VladimirAlexiev opened this issue ยท 7 comments

Should YAML-LD allow or prohibit multiple documents in YAML?

  • Which YAML parsers support multiple documents?
  • What are useful examples of using multiple documents?
  • If we decide to use them in YAML-LD, how should they be represented? As RDF graphs?
  • Below I formulate a positive use case, but I'm not quite certain we want this because of its complexity

PLEASE VOTE with ๐Ÿ‘ or ๐Ÿ‘Ž , thanks!


Eg1: multiple identical keys are forbidden by YAML linters.
But they are ok if they are in different documents.
Example by @ioggstream from #42 (comment):

---
a: 1
...
---
a: 2
...

Eg2: YAML metadata followed by a markdown textual body is widely used in some blog/content management systems:

---
created: 2022-07-03
published: 2022-07-04
title: Frobnification
author: A. U. Thor
...
Frobnification was invented in prehistoric times.
It's a useful meta-process wherein...

As an information architect.
I want to be able to use multiple documents in YAML-LD.
So that I can transmit several closely related documents (graphs) together.

Some notes:

a. Theoretically speaking

  1. a YAML stream includes one or more documents
  2. a stream can be transmitted on the net or archived in a file

In python, when you parse a stream containing multiple documents
you need to use a yaml.safe_load_all instead of yaml.safe_load

b. not sure the eg2 provided above is valid yaml.

Which YAML parsers support multiple documents?

In python, when you parse a stream containing multiple documents
you need to use a yaml.safe_load_all instead of yaml.safe_load

What are useful examples of using multiple documents?

In kubernetes, multiple YAML documents are bundled together
to describe related deployment units.

Another example could be bundling in a single file different related
datasets that should be imported (e.g metadata, data)
or (ontology, dataset).

If we decide to use them in YAML-LD, how should they be represented?

As different JSON-LD documents related between them

As RDF graphs?

Aren't they always RDF graphs?

from rdflib import Graph

g = Graph()
for document in yaml.safe_load_all("docs.yamlld"):
  g.parse(document, format="application/ld+yaml")

Below I formulate a positive use case, but I'm not quite certain we want this because of its complexity

I see it more as a bundling method. The complexity lies inside each document.

WDYT?

@ioggstream

not sure the eg2 provided above is valid yaml

Does it look better now?

As different JSON-LD documents related between them

But how can we relate documents?

  • JSON and YAML have no idea of "URL" or "document at URL" and setting "base"
  • JSON-LD has @base but it sets the base for terms inside the doc, not "the semantic URL" of the doc itself

Aren't they always RDF graphs?

I agree they should be graphs. Then we need:

  • some way to denote or auto-generate graph IDs for each of the multiple docs (eg #1, #2...)
  • to figure out how it relates to @graph: by default they all go to the default graph (triples not quads)

Eg this

{"@context": {"@base": "http://example.org", "@vocab":"http://example.org/",
              "spouse":{"@type":"@id"},"statedIn":{"@type":"@id"}},
 "@id": "#bart", "spouse": "#marge", "statedIn": ""}

results in these triples (not quads)

<http://example.org#bart> <http://example.org/spouse> <http://example.org#marge> .
<http://example.org#bart> <http://example.org/statedIn> <http://example.org> .

My two cents about eg2. This form of writing is often known as front matter, originally proposed by Jekyll. Syntax:

---
title: My Cat
tags:
    - article
    - pets
---

My cat is the most handsome cat in the whole world.

A few examples of software that supports YAML front matter for Markdown documents:

I am using this format to source YAML-LD from the front matter.

However, this is not valid YAML and thus I do not believe it applies to the question at hand. Does it?

JSON-LD-API has some options and descriptions for processing multiple script elements within an HTML document using extractAllScripts, that would seem relevant.

@anatoly-scherbakov
This is also used by pandoc.

I thought the second doc consists of one long string? But that would require some quoting or escaping, else colons and dashes at BOL will throw it off.
Agreed, strike eg2

@VladimirAlexiev @gkellogg this will be mainly addressed in ietf-wg-httpapi/mediatypes#55

Thanks for this issue: without this the YAML media type would have missed this piece.

@anatoly-scherbakov wrt the document in the example is valid like @VladimirAlexiev said.

s=("""---
title: My Cat
tags:
    - article
    - pets
---

My cat is the most handsome cat in the whole world.
""")
for d in yaml.safe_load_all(s):
  print(d)