Multiple documents in YAML
VladimirAlexiev opened this issue ยท 7 comments
Should YAML-LD allow or prohibit multiple documents in YAML?
- Which YAML parsers support multiple documents?
- What are useful examples of using multiple documents?
- If we decide to use them in YAML-LD, how should they be represented? As RDF graphs?
- Below I formulate a positive use case, but I'm not quite certain we want this because of its complexity
PLEASE VOTE with ๐ or ๐ , thanks!
Eg1: multiple identical keys are forbidden by YAML linters.
But they are ok if they are in different documents.
Example by @ioggstream from #42 (comment):
---
a: 1
...
---
a: 2
...
Eg2: YAML metadata followed by a markdown textual body is widely used in some blog/content management systems:
---
created: 2022-07-03
published: 2022-07-04
title: Frobnification
author: A. U. Thor
...
Frobnification was invented in prehistoric times.
It's a useful meta-process wherein...
As an information architect.
I want to be able to use multiple documents in YAML-LD.
So that I can transmit several closely related documents (graphs) together.
Some notes:
a. Theoretically speaking
- a YAML stream includes one or more documents
- a stream can be transmitted on the net or archived in a file
In python, when you parse a stream containing multiple documents
you need to use a yaml.safe_load_all
instead of yaml.safe_load
b. not sure the eg2 provided above is valid yaml.
Which YAML parsers support multiple documents?
In python, when you parse a stream containing multiple documents
you need to use a yaml.safe_load_all instead of yaml.safe_load
What are useful examples of using multiple documents?
In kubernetes, multiple YAML documents are bundled together
to describe related deployment units.
Another example could be bundling in a single file different related
datasets that should be imported (e.g metadata, data)
or (ontology, dataset).
If we decide to use them in YAML-LD, how should they be represented?
As different JSON-LD documents related between them
As RDF graphs?
Aren't they always RDF graphs?
from rdflib import Graph
g = Graph()
for document in yaml.safe_load_all("docs.yamlld"):
g.parse(document, format="application/ld+yaml")
Below I formulate a positive use case, but I'm not quite certain we want this because of its complexity
I see it more as a bundling method. The complexity lies inside each document.
WDYT?
not sure the eg2 provided above is valid yaml
Does it look better now?
As different JSON-LD documents related between them
But how can we relate documents?
- JSON and YAML have no idea of "URL" or "document at URL" and setting "base"
- JSON-LD has
@base
but it sets the base for terms inside the doc, not "the semantic URL" of the doc itself
Aren't they always RDF graphs?
I agree they should be graphs. Then we need:
- some way to denote or auto-generate graph IDs for each of the multiple docs (eg
#1, #2...
) - to figure out how it relates to
@graph
: by default they all go to the default graph (triples not quads)
Eg this
{"@context": {"@base": "http://example.org", "@vocab":"http://example.org/",
"spouse":{"@type":"@id"},"statedIn":{"@type":"@id"}},
"@id": "#bart", "spouse": "#marge", "statedIn": ""}
results in these triples (not quads)
<http://example.org#bart> <http://example.org/spouse> <http://example.org#marge> .
<http://example.org#bart> <http://example.org/statedIn> <http://example.org> .
My two cents about eg2
. This form of writing is often known as front matter, originally proposed by Jekyll. Syntax:
---
title: My Cat
tags:
- article
- pets
---
My cat is the most handsome cat in the whole world.
A few examples of software that supports YAML front matter for Markdown documents:
- MkDocs static site builder
- Typora editor
- python-frontmatter library
I am using this format to source YAML-LD from the front matter.
However, this is not valid YAML and thus I do not believe it applies to the question at hand. Does it?
JSON-LD-API has some options and descriptions for processing multiple script elements within an HTML document using extractAllScripts
, that would seem relevant.
@anatoly-scherbakov
This is also used by pandoc.
I thought the second doc consists of one long string? But that would require some quoting or escaping, else colons and dashes at BOL will throw it off.
Agreed, strike eg2
@VladimirAlexiev @gkellogg this will be mainly addressed in ietf-wg-httpapi/mediatypes#55
Thanks for this issue: without this the YAML media type would have missed this piece.
@anatoly-scherbakov wrt the document in the example is valid like @VladimirAlexiev said.
s=("""---
title: My Cat
tags:
- article
- pets
---
My cat is the most handsome cat in the whole world.
""")
for d in yaml.safe_load_all(s):
print(d)