rustwasm/wasm-bindgen

JsValue serde support shouldn't go through serde_json

fitzgen opened this issue ยท 16 comments

Going through serde_json ends up including a bunch of f64 formatting code, which is quite code size heavy.

I'm not super familiar with implementing serde serializers and deserializers, but if we can express "we only support things that can AsRef<JsValue>" as a trait bound for serialization, then I think we can go directly into and from JsValue without stringifying. This could be a perf win as well.

I'm currently working on a generic solution for this to improve performance of one of my projects. Is it possible to assign this issue to me for tracking?

@RReverser go for it! :)

So far it looks like more often than not perf is actually getting worse due to overhead of frequent calls between JS and WASM (at least on Node.js / V8).

But I just finished the prototype, and only starting to actually profile and see what could be improved, so still hoping for something obvious.

P.S. This is where I found the try_iter performance issue btw: #1386

Okay, after a bunch of custom improvements and, finally, #1391, I've finally managed to surpass serde_json in terms of benchmark performance (this turned out way harder than it seemed at first).

Now time to focus on cleaning up and trying to reduce the size :)

Hey @RReverser is there somewhere I can see your work on writing the serde (de)serializers? They sound awesome!

It's effectively ready (feature-complete), but I'm trying to use it in our real internal app (Cloudflare Worker) before open-sourcing. Should be out soon!

Okay, file-size savings of WASM+Brotli look promising now too (used against benchmarks which parse/serialize a bunch of structures with either one or the other library):

  json wasm+br wasm-bindgen wasm+br %
Os 118,891 92,339 -22%
Os + strip 100,886 74,141 -27%
O3 129,316 92,541 -28%
O3 + strip 120,387 83,615 -31%

Getting there!

Okay, I've finally published a wasm-bindgen + Serde integration as a separate library - https://github.com/cloudflare/serde-wasm-bindgen.

There are few reasons for that at the moment. First, it makes it's a bit easier to maintain that way as it has own tests, benchmarks and stability guarantees which we might want to play with over time. Another, more important, issue is that, in order to support more JavaScript types and idiomatic conversions, it's not strictly compatible with current serde-json-based APIs (JsValue::from_serde, JsValue::into_serde) and might convert values differently.

If that's okay, I'd love to eventually replace Serde support in wasm-bindgen to use serde-wasm-bindgen instead of serde-json under the hood, but I don't know what stability guarantees wasm-bindgen currently provides for Serde integration.

I suppose this will require waiting for a breaking change release? If so, it might be worth just pointing users to the external crate in docs for now, and eventually either switching or removing current wasm-bindgen from_serde/into_serde APIs.

Either way, I hope that's okay and solves the issue, and let me know what you think about further integration!

Okay, I've finally published a wasm-bindgen + Serde integration as a separate library - https://github.com/cloudflare/serde-wasm-bindgen.

Nice!!

Another, more important, issue is that, in order to support more JavaScript types and idiomatic conversions, it's not strictly compatible with current serde-json-based APIs (JsValue::from_serde, JsValue::into_serde) and might convert values differently.

If that's okay, I'd love to eventually replace Serde support in wasm-bindgen to use serde-wasm-bindgen instead of serde-json under the hood, but I don't know what stability guarantees wasm-bindgen currently provides for Serde integration.

I suppose this will require waiting for a breaking change release?

It isn't clear to me if this is a theoretical incompatibility of encoding into JS values potentially in the future or if it is already incompatible in practice right now. If the latter, do you have examples of Rust things that serialize into different JS values right now?

I agree that incompatibly changing the JS value encoding would require a breaking bump, since people could have JS code they interact with that very reasonably depends on the shape of the JS values returned from something that uses the current serde_json-based code.

If so, it might be worth just pointing users to the external crate in docs for now, and eventually either switching or removing current wasm-bindgen from_serde/into_serde APIs.

Agreed. We can also put a notice on the current methods and docs linking to this new external crate right now.

I think this kind of functionality is fundamental enough that we want to include it "built-in" to wasm-bindgen, so I would prefer to swap out our current serde_json-based code with this when we do the next breaking bump. Will label this issue appropriately.

On compatibility matters: I have in mind situations where I want to share code between server and client, e.g. for an isomorphic web app, or for an API library that could be running in a straight Rust or a JS-interop environment. The same code might therefore expect to be emitting JSON on the server, or interacting with JavaScript values on the client; incompatibility between the two would therefore be very undesirable. The current use of serde_json has the incidental benefit of guaranteeing compatibility.

I think this kind of functionality is fundamental enough that we want to include it "built-in" to wasm-bindgen

I agree. As said above, though, my concern is coupling between versions - we might want to experiment with supported types in serde-wasm-bindgen in the future. When these are separate independent crates, it's pretty easy for users to bump one without the other, but when it's built-in, now we need to be much more careful about breaking changes since bumping wasm-bindgen to major upgrades seems more rare... What do you think?

If the latter, do you have examples of Rust things that serialize into different JS values right now?

Few things off the top of my mind:

  • Custom toJSON on 3rd-party objects won't be taken into the account anymore.
  • Only safe integers as described in README are convertible to/from u64 and above - previously with JSON these would silently convert with a precision loss.
  • Byte buffers being supported only from real byte blobs (ArrayBuffer and Uint8Array) where previously it would instead accept raw strings and arrays due to JSON limitations.

Agreed. We can also put a notice on the current methods and docs linking to this new external crate right now.

@fitzgen Would you like to make a PR for this? If not, I might, but you might be better with wording :)

I agree. As said above, though, my concern is coupling between versions - we might want to experiment with supported types in serde-wasm-bindgen in the future. When these are separate independent crates, it's pretty easy for users to bump one without the other, but when it's built-in, now we need to be much more careful about breaking changes since bumping wasm-bindgen to major upgrades seems more rare... What do you think?

I suspect that by the time we are ready to do a breaking change in wasm-bindgen itself โ€” which is pretty far out there โ€” serde-wasm-bindgen will probably have settled down and this won't be an issue anymore.

@fitzgen Would you like to make a PR for this? If not, I might, but you might be better with wording :)

Sure, I can whip a draft up.

which is pretty far out there โ€” serde-wasm-bindgen will probably have settled down and this won't be an issue anymore

Sounds reasonable to me. Just been thinking about an overall design and coupling, but if it's okay with you, fine by me too :)

Sure, I can whip a draft up.

Thanks!

PR to add a note to the guide: #1553

This serde-serialize feature is causing circular dependency issues, see tkaitchuck/aHash#95 (comment). Would it make sense to deprecate this feature and simply tell users to use the serde-wasm-bindgen library? Given that wasm-bindgen is essentially mandatory on wasm32-unknown-unknown, it depending on serde (even optionally) is likely to keep causing issues.