Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write UTF-8 data to the clipboard. #217

Open
snianu opened this issue Jun 13, 2024 · 3 comments
Open

Write UTF-8 data to the clipboard. #217

snianu opened this issue Jun 13, 2024 · 3 comments

Comments

@snianu
Copy link
Contributor

snianu commented Jun 13, 2024

Popular native apps on Windows read formats like image/svg+xml in UTF-8 form [1].
In the spec, before the payload for a format gets written to the clipboard, the content is converted from UTF-8 into scalar values. Spec text "Let payload be the result of UTF-8 decoding item’s underlying byte sequence."(https://w3c.github.io/clipboard-apis/#write-blobs-and-option-to-the-clipboard).
Should this text be changed to write UTF-8 encoded data directly to the clipboard?

[1] https://docs.google.com/document/d/1ULlihA0FOJOqcyD9MgzLZrAbk0uTQPJqDPuPJ2aiuS4/edit?usp=sharing

@snianu
Copy link
Contributor Author

snianu commented Jun 13, 2024

@sanketj @whsieh @EdgarChen

@snianu snianu added the Agenda+ label Jun 13, 2024
@annevk
Copy link
Member

annevk commented Jun 13, 2024

Why do we decode at all if the payload is a blob? Like how does this make sense for image/png?

@css-meeting-bot
Copy link
Member

The Web Editing Working Group just discussed Write UTF-8 data to the clipboard., and agreed to the following:

  • RESOLVED: Remove the bullet about UTF-8 encoding. Anupam to file follow up issue to investigate what happens when you try to send invalid UTF chars though.
The full IRC log of that discussion <dandclark> topic: Write UTF-8 data to the clipboard.
<dandclark> github: https://github.com//issues/217
<dandclark> snianu: Recently we found in Chromium that when we copy svg (chromium supports img/svg), we switch encoding from utf-8 to utf-16
<dandclark> ...: When we paste in native apps like Word, the image doesn't render
<dandclark> ...: It's because the native apps expect utf-8
<dandclark> ...: We investigated, found in the spec that when we write blobs to system clipboard, spec says use utf-8 decoder, write scalar values to system clipboard
<dandclark> ...: Trying to get feedback on whether to change the spec
<dandclark> ...: Or are there corner cases we're missing like for PNG
<dandclark> smaug: I think what Anne noticed is a clear bug
<dandclark> snianu: Is there a specific encoding rule that FF or Safari follow when writing formats? Or is it whatever encoding is in the blob type?
<dandclark> smaug: I can't recall
<dandclark> ...: E.g. if your OS has image-specific backing store you do some additional transformation
<dandclark> snianu: Agree. I read in Apple documentation it's default UTF-16 but can use others
<dandclark> ...: Agree for images it doesn't make sense , for other MIME types like svg and HTML, does it make sense to write UTF-8?
<dandclark> ...: Windows has separate APIs for UTF and ASCII characters
<dandclark> ...: I think there's lots of different cases and encoding schemes
<dandclark> ...: Don't know if makes sense to standardize it
<dandclark> ...: Because it's also platform specific
<dandclark> anne: The one thing you could maybe do is abstract between text and byte sequence types
<dandclark> ...: For text sequence types, always do UTF pass so you always get scalar values
<dandclark> ...: Is interesting question what platforms currently do. If you put zero-bytes in text stream, do you get zero-bytes or replacement chars?
<dandclark> snianu: For the existing spec text, do we all agree it's not valid and we should remove it?
<dandclark> ...: And may be do investigation to see what can be added to the spec, maybe as a note?
<dandclark> anne: Reasonable to remove UTF-8 step and then investigate
<dandclark> smaug: Might be useful to see why we have the UTF-8 thing in the spec
<dandclark> anne: Good to do blame analysis, I didn't yet
<dandclark> smaug: It's very specific, might be something interesting mentioned in spec issue somewhere
<dandclark> johanneswilm: Is there agreement?
<dandclark> johanneswilm: It's always either bytes or UTF-8? Any risk of other older encodings?
<dandclark> anne: It's another interesting question. It's why I think bytes are the answer and we need to investigate further.
<dandclark> johanneswilm: Who will file follow up issue?
<dandclark> snianu: I can
<dandclark> RESOLVED: Remove the bullet about UTF-8 encoding. Anupam to file follow up issue to investigate what happens when you try to send invalid UTF chars though.

@snianu snianu removed the Agenda+ label Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants