/rdf-parse.js

Parses RDF from any serialization

Primary LanguageTypeScriptMIT LicenseMIT

RDF Parse

Build status Coverage Status npm version

This library parses RDF streams based on content type (or file name) and outputs RDF/JS-compliant quads as a stream.

This is useful in situations where you have RDF in some serialization, and you just need the parsed triples/quads, without having to concern yourself with picking the correct parser.

The following RDF serializations are supported:

Name Content type Extensions
TriG application/trig .trig
N-Quads application/n-quads .nq, .nquads
Turtle text/turtle .ttl, .turtle
N-Triples application/n-triples .nt, .ntriples
Notation3 text/n3 .n3
JSON-LD application/ld+json, application/json .json, .jsonld
RDF/XML application/rdf+xml .rdf, .rdfxml, .owl
RDFa and script RDF data tags HTML/XHTML text/html, application/xhtml+xml .html, .htm, .xhtml, .xht
Microdata text/html, application/xhtml+xml .html, .htm, .xhtml, .xht
RDFa in SVG/XML image/svg+xml,application/xml .xml, .svg, .svgz

Internally, this library makes use of RDF parsers from the Comunica framework, which enable streaming processing of RDF.

Internally, the following fully spec-compliant parsers are used:

Installation

$ npm install rdf-parse

or

$ yarn add rdf-parse

This package also works out-of-the-box in browsers via tools such as webpack and browserify.

Require

import rdfParser from "rdf-parse";

or

const rdfParser = require("rdf-parse").default;

Usage

Parsing by content type

The rdfParser.parse method takes in a text stream containing RDF in any serialization, and an options object, and outputs an RDFJS stream that emits RDF quads.

const textStream = require('streamify-string')(`
<http://ex.org/s> <http://ex.org/p> <http://ex.org/o1>, <http://ex.org/o2>.
`);

rdfParser.parse(textStream, { contentType: 'text/turtle', baseIRI: 'http://example.org' })
    .on('data', (quad) => console.log(quad))
    .on('error', (error) => console.error(error))
    .on('end', () => console.log('All done!'));

Parsing by file name

Sometimes, the content type of an RDF document may be unknown, for those cases, this library allows you to provide the path/URL of the RDF document, using which the extension will be determined.

For example, Turtle documents can be detected using the .ttl extension.

const textStream = require('streamify-string')(`
<http://ex.org/s> <http://ex.org/p> <http://ex.org/o1>, <http://ex.org/o2>.
`);

rdfParser.parse(textStream, { path: 'http://example.org/myfile.ttl', baseIRI: 'http://example.org' })
    .on('data', (quad) => console.log(quad))
    .on('error', (error) => console.error(error))
    .on('end', () => console.log('All done!'));

Getting all known content types

With rdfParser.getContentTypes(), you can retrieve a list of all content types for which a parser is available. Note that this method returns a promise that can be await-ed.

rdfParser.getContentTypesPrioritized() returns an object instead, with content types as keys, and numerical priorities as values.

// An array of content types
console.log(await rdfParser.getContentTypes());

// An object of prioritized content types
console.log(await rdfParser.getContentTypesPrioritized());

License

This software is written by Ruben Taelman.

This code is released under the MIT license.