Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid copying and re-encoding for String every time #54

Open
kateinoigakukun opened this issue Sep 16, 2020 · 6 comments
Open

Avoid copying and re-encoding for String every time #54

kateinoigakukun opened this issue Sep 16, 2020 · 6 comments

Comments

@kateinoigakukun
Copy link
Member

enum JSValue {
-    case string(String)
+    case string(JSString)
}

+ class JSString: JSBridgeClass, StringProtocol {
+    internal let id: JavaScriptObjectRef
+    ...
+ }
@j-f1
Copy link
Member

j-f1 commented Sep 16, 2020

Have you tried using something like String(utf16CodeUnits: UnsafePointer<unichar>, count: Int) to create a string from binary data (and string.utf16/string.withCString to send it the other way) instead? Or does that still do the extra work of re-encoding?

Another idea: since many strings (especially object keys) are ASCII, would it be possible to have a second string type that only supports ASCII and is faster to decode?

@kateinoigakukun
Copy link
Member Author

I'm not sure how to get utf16 byte sequence from JavaScript String and create JavaScript String from utf16 byte sequence without re-encoding. 🤷

@MaxDesiatov
Copy link
Contributor

MaxDesiatov commented Sep 16, 2020

Starting with Swift 5 it's UTF-8 under the hood anyway, if I understand correctly. I think we'd need to patch stdlib to either allow both encodings, or to entirely force it to use UTF-16 when targeting WebAssembly.

@kateinoigakukun
Copy link
Member Author

IMO, I don't want to change the default encoding. I think it's a too big change and the change improve performance only when running on JavaScript environment.

My idea is keeping the Swift side encoding way and reduce re-encoding opportunities.

@MaxDesiatov
Copy link
Contributor

My reasoning is that I don't see any other way to get rid of the ICU dependency, is it what's actually being used for re-encoding? I'd be surprised if it can become smaller than 100kb even after optimizations. Maybe we could add a compiler flag or something that sets the default encoding on per-build basis rather than the whole Wasm/WASI platform? Otherwise how can we ever become competitive to AssemblyScript, which has a mere 2kb overhead in its full runtime? I know that AssemblyScript is very minimalistic, I only wish one could strip Swift runtime similarly as much as possible not by default, but only if they want to achieve the same minimalism in their SwiftWasm apps.

Another idea is to keep String UTF-8, but allow StaticString to use UTF-16, or maybe introduce some other way to specify a UTF-16 literal? The reasoning is that Text and other types that rely on strings in Tokamak could avoid using UTF-8 String altogether, take that UTF-16 literal and pass it directly to JSString.

@kateinoigakukun
Copy link
Member Author

@MaxDesiatov

My reasoning is that I don't see any other way to get rid of the ICU dependency, is it what's actually being used for re-encoding?

Re-encoding doesn't require ICU because it can be done with TextEncoder and TextDecoder in JavaScript.
And as far as I know, ICU is only used to get extra character info (e.g. isEmoji or equality checking with normalization)

I think keeping default encoding as UTF8 can be accomplished with "no-ICU mode"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants