swiftwasm/JavaScriptKit

Avoid copying and re-encoding for String every time

kateinoigakukun opened this issue · 6 comments

enum JSValue {
-    case string(String)
+    case string(JSString)
}

+ class JSString: JSBridgeClass, StringProtocol {
+    internal let id: JavaScriptObjectRef
+    ...
+ }
j-f1 commented

Have you tried using something like String(utf16CodeUnits: UnsafePointer<unichar>, count: Int) to create a string from binary data (and string.utf16/string.withCString to send it the other way) instead? Or does that still do the extra work of re-encoding?

Another idea: since many strings (especially object keys) are ASCII, would it be possible to have a second string type that only supports ASCII and is faster to decode?

I'm not sure how to get utf16 byte sequence from JavaScript String and create JavaScript String from utf16 byte sequence without re-encoding. 🤷

Starting with Swift 5 it's UTF-8 under the hood anyway, if I understand correctly. I think we'd need to patch stdlib to either allow both encodings, or to entirely force it to use UTF-16 when targeting WebAssembly.

IMO, I don't want to change the default encoding. I think it's a too big change and the change improve performance only when running on JavaScript environment.

My idea is keeping the Swift side encoding way and reduce re-encoding opportunities.

My reasoning is that I don't see any other way to get rid of the ICU dependency, is it what's actually being used for re-encoding? I'd be surprised if it can become smaller than 100kb even after optimizations. Maybe we could add a compiler flag or something that sets the default encoding on per-build basis rather than the whole Wasm/WASI platform? Otherwise how can we ever become competitive to AssemblyScript, which has a mere 2kb overhead in its full runtime? I know that AssemblyScript is very minimalistic, I only wish one could strip Swift runtime similarly as much as possible not by default, but only if they want to achieve the same minimalism in their SwiftWasm apps.

Another idea is to keep String UTF-8, but allow StaticString to use UTF-16, or maybe introduce some other way to specify a UTF-16 literal? The reasoning is that Text and other types that rely on strings in Tokamak could avoid using UTF-8 String altogether, take that UTF-16 literal and pass it directly to JSString.

@MaxDesiatov

My reasoning is that I don't see any other way to get rid of the ICU dependency, is it what's actually being used for re-encoding?

Re-encoding doesn't require ICU because it can be done with TextEncoder and TextDecoder in JavaScript.
And as far as I know, ICU is only used to get extra character info (e.g. isEmoji or equality checking with normalization)

I think keeping default encoding as UTF8 can be accomplished with "no-ICU mode"