mscdex/busboy

Changes in v1.0.0

mscdex opened this issue · 11 comments

Breaking and Potentially Breaking Changes

  • Minimum supported node version is now v10.16.0
  • No more constructor
  • Use 'close' event instead of 'finish' event when you need to execute
    something whether the busboy stream encountered an error or not
  • Some 'field' and 'file' event handler parameters have changed
    • Truncated flags, encoding, and mime type information have been consolidated
      into a single object passed to the event handlers
  • Some error messages have changed
  • Switched text decoding from using an older (and slower) reference TextDecoder
    implementation to using encodings/charsets supported internally by node
    (via either core encodings or built-in ICU). If you need to support less
    common encodings/charsets, then make sure you are using a build of node that
    contains the full ICU database for maximum compatibility.
  • (multipart/form-data) Stricter header parser
  • (multipart/form-data) Smaller max allowable (total) header size (per part) to align with
    node's http parser (80KB -> 16KB)

Misc. Changes

  • Faster non-file fields handling due to switching method of handling
    transcoding of strings
  • Implementation simplified overall due to advancements made in node streams
    since the original implementation
  • (multipart/form-data) Faster parsing (partly from streamsearch improvements)
  • (multipart/form-data) No longer uses dicer, uses streamsearch directly

Benchmarks

Setup

Node version used: v16.13.1

Package versions used:

  • busboy - v0.3.1
    • Already takes into account streamsearch v1.x perf improvements
      via dicer
  • busboy - v1.0.0
  • formidable - v2.0.1
    • formidable has a streaming urlencoded parser, but it is not used
      by formidable currently when parsing urlencoded forms, however I have
      included benchmarks for this parser in the urlencoded benchmarks for
      completeness
  • multiparty - v4.2.2

Note: In these benchmarks, formidable technically has a bit of an edge
compared to the other modules since the code to benchmark it is using the
individual parsers (multipart and urlencoded) directly, skipping a lot of extra
code that would ordinarily execute while parsing a request. This was done this
way as it was the easiest way to benchmark formidable without saving files to
disk, which would otherwise put formidable at a disadvantage because the other
modules are not saving files to disk.

Note 2: bench-urlencoded-fields-100pairs-small.js takes into account extra
non-parsing-related logic (e.g. creating new instances and other setup code) so
it's not strictly a measure of parsing performance. The reason for this is that
the benchmark parses the same request multiple times in an async loop, since the
modules parsing an urlencoded request is very quick and may not give V8 time to
optimize functions, etc. and the modules' parser instances generally cannot be
reused.

Results

  • bench-multipart-fields-100mb-small.js

    Package Average time (ms) Average max RSS (MB)
    busboy (pre-1.0) 15077 338
    busboy 426 145
    formidable 3600 143
    multiparty 1450 144
  • bench-multipart-fields-100mb-big.js

    Package Average time (ms) Average max RSS (MB)
    busboy (pre-1.0) 1160 270
    busboy 398 186
    formidable 3640 187
    multiparty 1510 186
  • bench-multipart-files-100mb-small.js

    Package Average time (ms) Average max RSS (MB)
    busboy (pre-1.0) 433 149
    busboy 377 145
    formidable 3700 143
    multiparty 1400 145
  • bench-multipart-files-100mb-big.js

    Package Average time (ms) Average max RSS (MB)
    busboy (pre-1.0) 451 186
    busboy 398 186
    formidable 3800 187
    multiparty 1500 186
  • bench-urlencoded-fields-100pairs-small.js

    Package Average time (ms) Average max RSS (MB)
    busboy (pre-1.0) 2700 99
    busboy 227 41
    formidable 230 41
    formidable (streaming) 260 42
  • bench-urlencoded-fields-900pairs-small-alt.js

    Package Average time (ms) Average max RSS (MB)
    busboy (pre-1.0) 31.0 45
    busboy 5.7 34
    formidable 6.2 34
    formidable (streaming) 6.7 34

node-fetch@3.1.0 just recently implemented a multipart/form-data decoder of an older fork of formidable that was initially built for github/fetch (a fetch polyfill for the browser) that don't use any NodeJS core stuff like Buffer, EventEmitter or Streams... Here is the relevant PR

I'm interested to see how it stands up against busboy 1.0 in terms of speed. And if it would be worth implementing busboy into node-fetch.

I'm also interested how this stands up against using URLSearchParams to do the hole parsing instead of using any streamable method.

node-fetch@3.1.0 just recently implemented a multipart/form-data decoder

I'm not sure I understand why an http client is implementing form decoding.

Anyway, if the multipart/form-data parser was based on one from an older fork of formidable, then I would expect performance to be about the same as formidable since I don't think the underlying algorithms of their multipart parser implementation have changed much.

I'm also interested how this stands up against using URLSearchParams to do the hole parsing instead of using any streamable method.

That's what formidable uses in their non-streaming parser, which is currently used when parsing urlencoded forms in formidable. Their streaming parser is a more recent addition and is not currently used at all. You can see the results for both in the tables above.

I'm not sure I understand why an http client is implementing form decoding.

response.formData() is part of the fetch spec (think it was mainly built for service worker to intercept requests to modify payloads before it's sent of to the server)

Another thing to keep in mind is that busboy has always supported non-UTF8 encodings, something the other modules do not seem to support (e.g. when parsing urlencoded forms with URLSearchParams, you are limited in supported encodings/charsets).

Another distinction is that busboy supports extended header value parameters, which is important when parsing things like filenames in different languages. Not all of the form parsing modules support that. busboy has supported this for the longest time, but v1.0.0 gives an extended filename parameter priority over the non-extended version.

response.formData() is part of the fetch spec (think it was mainly built for service worker to intercept requests to modify payloads before it's sent of to the server)

Ah ok, I was unaware that was a thing. That's pretty strange.

fyi, if you are interested in generating 100% spec compliant multipart/form-data payloads (with regards to how filename is suppose to be generated) for testing and what not, then i have built formdata-polyfill
it pass all wpt tests...

const { FormData, formDataToBlob, File } = await import('formdata-polyfill/esm.min.js')

const fd = new FormData()
fd.append('name', new File([data], 'filename.txt', {type: 'text/plain'}))

const body = await formDataToBlob(fd).arrayBuffer()

all doe it requires node v12.20 to load it with cjs...

Hey @mscdex on what version and how you run these benchmarks? Seems like when run busboy ones, the close isn't fire so getting no output.

@tunnckoCore Version of what? The package versions being benchmarked are already listed.

Also I'm not sure what you're referring to with the 'close' event not firing. Are you asking specifically about one of the benchmarks or are you lodging an issue you're having with using busboy ?

Version of what?

meant to say Nodejs version.

Are you asking specifically about one of the benchmarks

Yep, node bench-multipart-files-100mb-big.js busboy does not output anything, while the others are working. Not yet sure for the other benchmarks, just playing with that one now.

@tunnckoCore Node version is listed before the package versions.

Yep, node bench-multipart-files-100mb-big.js busboy does not output anything, while the others are working. Not yet sure for the other benchmarks, just playing with that one now.

Probably missing a parser.end().

Probably missing a parser.end()

Ah, yea. That was it.