-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LibWeb: Implement and use "isomorphic decoding" #1893
Conversation
b835430
to
a0464f3
Compare
Marked as a draft while I figure out what's going on with https://wpt.live/mimesniff/mime-types/parsing.any.html |
Testing for each change:
|
Un-drafting this because I'm happy with my testing. |
220a547
to
93f690d
Compare
Updates:
|
ba9110c
to
9bc08a5
Compare
9bc08a5
to
73924aa
Compare
StringBuilder builder(input.size()); | ||
for (u8 code_point : input) { | ||
builder.append_code_point(code_point); | ||
} | ||
return builder.to_string_without_validation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right thing to do - String
is by definition UTF-8, but here we can be throwing any encoding into the string. The to_string_without_validation
isn't a path to avoid that invariant, rather it's a performance optimization for when we already know the data going into String
is UTF-8 (as validation isn't free).
We use ByteString
to represent strings of arbitrary encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm misunderstanding what append_code_point
does that's exactly what I'm doing. This function takes ISO-8859-1 encoded bytes and converts them to the corresponding UTF-8 encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also: Latin1Decoder::process
and Decoder::to_utf8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, I think I was misunderstanding what isomorphic decoding means. But yes, this looks right. Thanks :)
f8b1363
to
392b534
Compare
392b534
to
24cb823
Compare
This couldn't be added in LadybirdBrowser#1893
@Gingeh There are still several places creating Also, when using |
The safest option would be to disallow aggregate initialization and exclusively use Maybe the aggregate initialization could be forbidden in favour of a
I'm not surprised, a lot of code still assumes headers are ASCII. These cases will need to be found and fixed over time. EDIT: |
I agreed that we should disallow aggregate initialization to in favor of
Yes! It is causing https://wpt.live/fetch/api/basic/request-headers-nonascii.any.html failed. |
This patch ensure Headers object's associated header list is ISO-8859-1 encoded when set using `Infra::isomorphic_encode`, and correctly decoded using `Infra::isomorphic_decode`. Follow-up of LadybirdBrowser#1893
This patch ensure Headers object's associated header list is ISO-8859-1 encoded when set using `Infra::isomorphic_encode`, and correctly decoded using `Infra::isomorphic_decode`. Follow-up of LadybirdBrowser#1893
This patch ensure Headers object's associated header list is ISO-8859-1 encoded when set using `Infra::isomorphic_encode`, and correctly decoded using `Infra::isomorphic_decode`. Follow-up of LadybirdBrowser#1893
This patch ensure Headers object's associated header list is ISO-8859-1 encoded when set using `Infra::isomorphic_encode`, and correctly decoded using `Infra::isomorphic_decode`. Follow-up of #1893
isomorphic encoding a value that has already been encoded will result in garbage data. `response_headers` is already encoded in ISO-8859-1, we cannot use `from_string_pair`, as it calls `isomorphic_decode`. Follow-up of LadybirdBrowser#1893
isomorphic encoding a value that has already been encoded will result in garbage data. `response_headers` is already encoded in ISO-8859-1, we cannot use `from_string_pair`, as it triggers ISO-8859-1 encoding. Follow-up of LadybirdBrowser#1893
isomorphic encoding a value that has already been encoded will result in garbage data. `response_headers` is already encoded in ISO-8859-1/latin1, we cannot use `from_string_pair`, as it triggers ISO-8859-1/latin1 encoding. Follow-up of LadybirdBrowser#1893
isomorphic encoding a value that has already been encoded will result in garbage data. `response_headers` is already encoded in ISO-8859-1/latin1, we cannot use `from_string_pair`, as it triggers ISO-8859-1/latin1 encoding. Follow-up of LadybirdBrowser#1893
isomorphic encoding a value that has already been encoded will result in garbage data. `response_headers` is already encoded in ISO-8859-1/latin1, we cannot use `from_string_pair`, as it triggers ISO-8859-1/latin1 encoding. Follow-up of LadybirdBrowser#1893
isomorphic encoding a value that has already been encoded will result in garbage data. `response_headers` is already encoded in ISO-8859-1/latin1, we cannot use `from_string_pair`, as it triggers ISO-8859-1/latin1 encoding. Follow-up of LadybirdBrowser#1893
isomorphic encoding a value that has already been encoded will result in garbage data. `response_headers` is already encoded in ISO-8859-1/latin1, we cannot use `from_string_pair`, as it triggers ISO-8859-1/latin1 encoding. Follow-up of #1893
This patch ensure Headers object's associated header list is ISO-8859-1 encoded when set using `Infra::isomorphic_encode`, and correctly decoded using `Infra::isomorphic_decode`. Follow-up of LadybirdBrowser/ladybird#1893 (cherry picked from commit 824e91ffdb568e28ec1640d28cf4f45d2fe4a0ae)
This is essentially spec-speak for "Decode as ISO-8859-1 / Latin-1", but existing code interpreted it as "Transparently interpret as UTF-8" which is only correct for ASCII characters (< 0x80).
This change fixes a crash in http://wpt.live/mimesniff/mime-types/charset-parameter.window.html when parsing a non-ascii header value. The test now fully passes when combining these changes with #1879.