rdfjs/types

RDF/JS and WHATWG stream interfaces alignment

matthieubosquet opened this issue · 11 comments

Are there plans to make the RDF/JS: Stream interface closer to the WHATWG Readable Stream interface?

I'd be quite keen for example to be able to use asynchronous iteration.

Hey Matthieu

[...] closer to the WHATWG Readable Stream

Not an authoritative answer, but "not likely" ™️ with the stream spec based in the node streams

I'd be quite keen for example to be able to use asynchronous iteration.

That said, async iteration is totally possible with the readable-stream already. So while you will need webpack to bundle as that will not work natively in a browser you can consume a stream in an async for loop

const fetch = require('@rdfjs/fetch')
const { quadToNTriples } = require('@rdfjs/to-ntriples')

async function main() {
  const res = await fetch('http://zazuko.github.io/tbbt-ld/data/person/sheldon-cooper.ttl')
  const stream = await res.quadStream()

  for await (const quad of stream) {
    console.log(quadToNTriples(quad))
  }
}

Not perfect but it allows you to build your APIs ignoring actual streams and only coding against AsyncIterable<Quad>

Hi Tomasz,

Thanks for your swift answer!

My problem is that I'm writing a TypeScript library and programming to the interface, therefore, the for await (const quad of stream) syntax not being reflected in the types is problematic.

If the readable quad stream is iterable, could we ammend the interface to reflect it?

types/stream.d.ts

Lines 22 to 30 in e021fe6

export interface Stream<Q extends BaseQuad = Quad> extends EventEmitter {
/**
* This method pulls a quad out of the internal buffer and returns it.
* If there is no quad available, then it will return null.
*
* @return A quad from the internal buffer, or null if none is available.
*/
read(): Q | null;
}

Maybe to something like:

export interface Stream<Q extends BaseQuad = Quad> extends EventEmitter {
    /**
     * This method pulls a quad out of the internal buffer and returns it.
     * If there is no quad available, then it will return null.
     *
     * @return A quad from the internal buffer, or null if none is available.
     */
    read(): Q | null;

    [Symbol.asyncIterator](): AsyncIterableIterator<Q>;
}

I appreciate that this should probably be reflected in the RDF/JS stream spec too, isn't it?

You do have a point I could agree with introducing such a change.

However, I think the RDF/JS spec should be by definition compatible with v2 of the stream implementation which does not support async iteration.

Until, and if, we make this change you can do type augmentation as I did in one of my projects

import type Quad from 'rdf-js'

declare module 'rdf-js' {
  interface Stream extends AsyncIterable<Quad> {}
}

Note that RDF/JS are low-level interfaces, from which you can build higher-level ones.

Back in the old days when we started RDF/JS, anything to do with promises was slower than callbacks.
So for me, the first step would be to come up with performance tests.

How is performance relevant exactly?

RDF/JS is also a spec and not an implementation, right? The v3 of streams can be consumed as async iterator. So we can reflect that in the RDF/JS.

IMO the decision would be based on whether or not we want to keep support for v2 of node stream.

How is performance relevant exactly?

The entire API design of RDF/JS is based on what interfaces can be implemented at high performance.

RDF/JS is also a spec and not an implementation, right?

Interfaces put an upper bound on achievable performance.
5 years ago, anything with promises was a magnitude slower.

I still don't see your point. Node stream already implements async iteration since v11 (and experimental in v10). All RDF/JS Stream does is constrain the chunk to be Quad instances.

A parser, being a producer of quads will use the underlying stream specification (thus already supporting Symbol.asyncIterator). The difference is only to the consumer, which can choose of the two ways for consuming.

What am I missing?

All RDF/JS Stream does is constrain the chunk to be Quad instances.

That is not correct. RDF/JS Stream is a subset of Node.js stream; the subset that was the fastest at the point of writing.

Using object streams at all is a performance hit. If you profile in node you will see just how much overhead they really cost. Better performance would be to go purely with callbacks or async iterable, but then the ability to attach multiple listeners would become user's responsibility.

Other than that I don't think there are any features provided by object streams that can't be reasonably replaced with callbacks or async iterable.

Using object streams at all is a performance hit.

Note that the minimum interface we demand is not Node.js streams (which have flow control), but rather event emitters, which are callback-driven.

We have considered it in rdfjs/stream-spec#12 That thread includes comments from nodejs core contributor and readable-stream maintainer.

Could we have BaseStream (low level) and Stream with AsyncIterator? Maybe Stream and HiStream not to break current definitions. Implementations provide Steram with AsyncIterator and it would be helpful to have common type for that.