Borewit/music-metadata

Migrate from `stream` to asyncIterators

ThaUnknown opened this issue · 19 comments

Feature Request
Migrate from node.js's stream library to asyncIterators. This would roughly look like this:

before

function parse (input) {
  input.on('data', handleData) // requires nodeJS-like streams
}

after

async function parse (input){
    // can be lazy functions returning streams
    if (typeof input === 'function') input = input()
    for await (const data of input){
      handleData(data)
    }
}

Why this is nice:

  • drops the node only dependencies
  • reduces package size
  • allows for input to be ANYTHING which has an async iterator [node.js streams have async iterators, but the user can also implement their own async data iterators]
  • allows for the library to be ran natively on any platform [node, deno, bun, browser] without transpiling
  • doesn't require any browser specific library

example usage of this kind of implementation in node.js:

import { parse } from 'music-metadata'
const filestream = fs.createReadStream('./song.mp3')

const result = await parse(filestream)

example usage in browser:

import { parse } from 'music-metadata'
import 'fast-readable-async-iterator'

const filestream = blob.stream()

const result = await parse(filestream)

Revolution, I like that!

drops the node only dependencies

Great if that would be possible, switching to web streams is what proposed sindresorhus.

allows for input to be ANYTHING which has an async iterator [node.js streams have async iterators, but the user can also implement their own async data iterators]

I am a bit concerned here. As parsing audio file essentially requires to parse through a stream of bytes, not just anything. I actually do not use necessary a Node.js stream, but I invented my own kind of stream, the tokenizer. Inventing your own things is almost an anti-pattern isn't it? It is usually means that you just was not aware of better solution already available. Anyway, I have done that. The big difference is that the tokenizer](https://github.com/Borewit/strtok3/#tokenizer) is able to ignore portions. In a conventional streams you have to read everything.

Back to the bytes, the reason I want something like bytes, is that the chunks (a chunk is logical block in an audio stream, also called frame or cell). I can imagine a async iterator would iterate through the chunks, yet that would give chick and egg problem, as it requires knowledge of the audio stream to determine the chunks.

allows for the library to be ran natively on any platform [node, deno, bun, browser] without transpiling
doesn't require any browser specific library

Fantastic if that would be possible.

Great if that would be possible, Borewit/strtok3#542 is what proposed sindresorhus.

web streams are a no-go in my opinion, async iterators have better compatibility and performance, web streams are poorly implemented on node and are very slow

I am a bit concerned here.

async iterators would be no different from streams, they would yield the chunks in the same way, just with different call methods. with async iterators you don't need to read a full stream, but you do need to do it in sequence. I looked at your tokenizer thing and it seemed a bit overkill, maybe this will help you simplify it, but I didn't have time to dive deep into the code as it seemed a bit all over the place

some examples of me using async iterators instead of streams:
https://github.com/ThaUnknown/join-async-iterator
joins multiple streams into one

https://github.com/ThaUnknown/fast-blob-stream
converts blobs and web streams into node streams

https://github.com/ThaUnknown/fast-readable-async-iterator
creates async iterators from web streams

I had a look to the examples and code, and I honestly fail to see the added value.

I had a look to the examples and code, and I honestly fail to see the added value.

the benefit is compatibility with node and browser in one repo, without polyfills, smaller bundle size and higher speeds

Wouldn't a simpler way be to just migrate from Buffer to UInt8Array? The Buffer class is already an extension of UInt8Array, so any function that takes a UInt8Array as input can also take a Buffer. It just requires changing from buffer.readInt16LE(offset) into new DataView(data.buffer).getInt16(offset, true).

yes and in order to deprecate buffer u need to deprecate stream, as it forces buffer as a dependency for output values

I already abstracted Buffer to UInt8Array where I could / where was reasonable.

I would love not to rely on Node specific API's, yet there are functions offered by Buffer which can not be found in DataView or UInt8Array like character encoding / decoding.

I already abstracted Buffer to UInt8Array where I could / where was reasonable.

I would love not to rely on Node specific API's, yet there are functions offered by Buffer which can not be found in DataView or UInt8Array like character encoding / decoding.

I've created a package for this: uint8-util which allows uint8 do to everything a buffer can, often times a lot faster, it doesn't have type definitions so to use it you'd need to look at the exports of util.js and node/browser.js

everything a buffer can

Interesting... everything? Like Buffer.from() with support for ascii, latin1, utf8 and utf16le?

yes, UTF-8 only but i easily could add support for latin ASCII etc, also for hex base64 text and binary to and from, tho it doesn't have a Buffer.from function because i believed that it's an anti pattern as it doesn't offer any information about the input data instead it has function like text2arr arr2hex hex2bin etc

again all this is useless as long as this library uses streams, that's why I suggested async iterators, they are faster, smaller, cross platform and dependency free

I'm not so sure. I've started working on a port here: https://github.com/Symbitic/music-metadata

There's still 47 failing tests, but 345 pass. I didn't have to change anything related to streams (yet).

@Borewit encoding/decoding is taken care of by TextEncoder/TextDecoder. That's what Deno uses for its (partial) compatibility layer. See the port where I added some helper functions.

@Symbitic i'd really use uint8-util for this, its way faster than whatever you wrote it hurts me looking at this, where you reconstruct a encoder/decoder each function call, this is atrociously slow

uint8-util is also tried and tested on webtorrent which @Borewit should be somewhat familiar with

I fixed that to only allocate one of each type of decoder.

I fixed that to only allocate one of each type of decoder.

do... you actually want to do this? do you actually want me to point out every single performance issue you've created one by one till you fix them? lol

uint8-util has several issues of its own. Besides not being written in TypeScript, it isn't portable. It uses a different file for browsers and Node. That can cause problems for bundlers and Deno.

If you want to point out any suggestions or work on your own port, you're more than welcome to do either, but I think you're coming close to making this personal.

not being written in typescript is a pure upside tbh, its also fully portable, the exports cause no issues for bundlers unless you decide to use esbuild, it's also quite literally written with cross-env compatibility in mind, which i care deeply about as you might have noticed by the fact that the author of the post, asking this very library for cross env compat is me....

the conditional export for node, is ONLY resolved by node, and default and import is used by everything else, it exists only to utilize node's native code implementations for extra speed on node, as some node native functions are up to 20% faster than ones I've written myself

uint8util runs fine in chrome, deno, bun, node, chromeapp and some smart tv's so i also don't know where you got that information from

I'm not making it personal, I'm making it performance-first which is what the other half of the original post is about, speed

@Borewit as per request I've added decoding types like latin etc and type definitions to uint8-util