ipfs-inactive/interface-js-ipfs-core

Proposal for Standardising API that can return buffered results vs Node.js Streams vs Pull Streams

daviddias opened this issue ยท 17 comments

tl;dr; this is a proposal to standardize the API definitions of function calls that may return buffered values, Node.js Streams or pull-streams.

I'll make this proposal and succinct as possible, with the hopes we can get around it asap.

Our current needs and requirements:

  • Users want to be able to consume data coming from calls (i.e: files.{cat, get, ls}) as a single result value. It is extremely convinient.
  • We know that buffering everything doesn't work for a big part of the cases.
  • We like to use pull-streams everywhere because it saves us from some Node.js hiccups and makes it extremely convenient between modules.
  • Users still to have access to Node.js Streams, since they are the most widely use.
  • We have a strong commitment to not breaking user-space, however, if this standardization proves to be popular and more user-friendly, I believe we can open an exception as long as we don't loose functionality (only some name changing), under the condition to have a smooth transition.

Essentially we have two options on the table

a) Use option arguments to specify what we are looking for

ipfs.files.cat(cid, {
  buffer: true         // buffer all the results and return a single value
  // PStream: true // return a Pull-Stream
  // NStream: true // return a Node.js Stream
}, callback)

b) Use different method names

ipfs.files.cat(cid, callback)
// or
ipfs.files.catPStream(cid)
// or
ipfs.files.catNStream(cid)

This option has a couple of advantages:

  • interpreter friendly - by having a method for each type, it will enable V8 to optimize it better
  • comparing to what we have today, it will enable the functions that return streams to return streams, instead of the 'callback with a stream'. Enabling things like ipfs.files.catNStream(cid).pipe(process.stdout) without going into callback. (The reason why we had to return a stream in a callback, was because of Promises, but that stays on ipfs.files.cat only.

Once this proposal goes forward we need to:

  • Update the API definitions
  • Update the tests
  • Update the implementations
  • Update the internals that exposes pull-streams with something like getStream, to be getPStream.

@ipfs/javascript-team I believe you all might want to chime on this.

๐Ÿ‘ for proposal b from my side, not sure about the names and naming is hard so go ahead with these :)

I am for option b,

what about

ipfs.files.cat(cid, callback)
// or
ipfs.files.catPullStream(cid)
// or
ipfs.files.catStream(cid)

?

@dignifiedquire @nicola thank you for the feedback!

With regards to the naming. I don't have strong feelings about 'catPStream' and 'catNStream', I kind of like @nicola suggestion. The only that feels weird to me from all the options shared was: 'catPull' without the 'Stream' keyword.

Yeah, @nicola's suggestion makes sense to me too. Divide the methods would be the most beneficial and least confusing (more arguments/options for a function, the less intuitive). Also, regarding the naming, using full names like catPullStream and catStream makes it easier to assume or remember what is returned, rather than using shortnames that can be confusing.

We need to provide this change the sooner the better. Streams are still confusion for devs.

One thing to consider is to s/catStream/catReadableStream since we will use in fact https://www.npmjs.com/package/readable-stream and not Node.js core streams to ensure compatibility and stability across Node.js versions.

We also should consider aliases

  • catReadableStream -> catRS
  • catPullStream -> catPS

The short names are reeeeaally hard to distinguish with P and R looking almost the same, so I think we should go for the long versions.

Not sure ReadableStream is good though, given that they sometimes might be duplex or other kind of streams, so could be more confusing.

Maybe we could do

// callback
files.cat(cid, callback)
// node stream
files.cat(cid)
// pull streams
files.catPull(cid)

It's easy to detect argument count and this might make it a bit less awkward to find correct names

Ok, let's ship this for js-ipfs 0.27.0. The resolution is:

Expose 3 higher level calls for streaming methods (cat used as example):

  • cat(cid, function (err, bufferedFile) {}) -> Promise - Support Callback or Promise API with promisify-es6
  • catReadableStream(cid) -> [ReadableStream](https://www.npmjs.com/package/readable-stream)
  • catPullStream(cid) -> [PullStream](http://npmjs.org/pull-stream)

This will unlock use cases with perf improvements as shown in ipfs/js-ipfs#988

Oh I missed promises in my proposal..

Pushed at the same time. Reviewing your proposal, @dignifiedquire There is a reason why it doesn't work. files.cat(cid) wouldn't be able to return a Node.js Stream because it would have to return a promise.

The other reason to call it ReadableStream and not Node.js Stream is that somehow ReadableStream and Node.js Stream have diverged (and we have caught bugs because of that, i.e isStream vs is-stream). So let's talk of ReadableStream as the Type/Class https://www.npmjs.com/package/readable-stream

I wonder if it wouldn't be better to have sth like

  • require('ipfs/promise')
  • require('ipfs/pull')

that way we could keep smaller bundles and easier names

re: #126 (comment)

Eventually, we will have ES6 Imports and I prefer waiting for that then yet having to explain to users that they have to require ipfs in different ways to get a different API.

I would suggest then going with cat, catStream and catPullStream

Work happening here #162

๐Ÿšข