Skip to content

Encourage UTF-8 for new formats and APIs #322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
domenic opened this issue Jun 4, 2021 · 14 comments
Closed

Encourage UTF-8 for new formats and APIs #322

domenic opened this issue Jun 4, 2021 · 14 comments

Comments

@domenic
Copy link
Member

domenic commented Jun 4, 2021

A lot of the new formats and APIs we've been designing (and some not-so-new) assume UTF-8 unconditionally. These include:

  • JavaScript modules (including upcoming JSON and CSS modules)
  • Workers, and anything included in them via importScript()s
  • WebSockets
  • EventSource
  • fetch()'s text() convenience method; XMLHttpRequest's responseText convenience getter; and Blob's text() convenience method
  • Various not-yet-shipped or still-early JSON-based formats like import maps, origin policy, or speculation rules

We also made it non-conforming for HTML documents to use any other encoding. And, Encoding tries to be clear that everything else is legacy.

It'd be good if this was captured in the design principles doc. https://w3ctag.github.io/design-principles/#new-data-formats is one place, that captures several of the above examples. There might be room for some separate guidance on APIs (not just formats), to capture the text() and responseText examples: basically, any time an API is interpreting some unknown bytes as a string, it should just assume it's always UTF-8.

@annevk
Copy link
Member

annevk commented Jun 5, 2021

XMLHttpRequest & fetch()'s JSON utilities and WebVTT come to mind as well.

@dcodeIO
Copy link

dcodeIO commented Jul 23, 2021

That this is used as a justification for what Wasm does in between function calls is just surprising. I mean, networking, file formats, fine, but breaking with JS is worrisome at best.

@annevk
Copy link
Member

annevk commented Jul 23, 2021

Can you explain how any of this breaks with JavaScript?

@dcodeIO
Copy link

dcodeIO commented Jul 23, 2021

I am specifically referring to the "new APIs" part. This design principle has been biting Wasm since December 2017, as it is regularly used as an argument to disallow surrogates on boundaries. Trapping, or replacing, on code that intentionally does not produce errors for backwards-compatibility reasons, in languages that make it so very easy to substring pairs with constants, goes way too far and is a huge mistake.

@annevk
Copy link
Member

annevk commented Jul 23, 2021

Why is it a huge mistake? If it was a huge mistake wouldn't we have noticed it being a problem in the past decade or so?

@dcodeIO
Copy link

dcodeIO commented Jul 23, 2021

Are you saying that since you didn't notice, exactly because it is currently intentionally allowed so it does not lead to errors, that the opposite of a legitimate argument must be true?

@annevk
Copy link
Member

annevk commented Jul 23, 2021

I don't think so. It's still not entirely clear to me what exactly you find problematic and for what reason. Examples would help.

@dcodeIO
Copy link

dcodeIO commented Jul 23, 2021

let myString = inputString.substring(0, 10); // user finds it funny to place an emoji at 9
map.set(myString, 42);

let alsoMyString = roundtripStringOverInterfaceTypesBoundaryButWhoKnows(myString);

map.get(alsoMyString) // undefined
if (myString == alsoMyString) {
  // false
}
let myString = getStringFromDatabaseOverInterfaceTypesBoundaryButWhoKnows();
queryWhere("stringInDatabase = ", myString); // no results
updateWhere("stringInDatabase = ", myString); // no update, update wrong row or error
deleteWhere("stringInDatabase = ", myString); // no delete, delete wrong row or error

@annevk
Copy link
Member

annevk commented Jul 23, 2021

That seems rather contrived and will already fail the moment you involve the network or URLs.

@dcodeIO

This comment has been minimized.

@dcodeIO
Copy link

dcodeIO commented Jul 24, 2021

Applying this design principle to Wasm is literally breaking the Web Platform, wrecking WebAssembly, damaging the reputation of the W3C and everyone involved, and I do not think that a bunch of thumbs down and marking my comments as spam is very helpful in this regard. This is not only highly security relevant as it may eventually affect any arbitrary amount of code ever written in JavaScript, JavaScript-likes, C# and Java, but in the worst case puts people's lives in serious danger when they rely on correctly functioning software that doesn't kill them just because someone put an emoji in the wrong place.

@plinss
Copy link
Member

plinss commented Jul 24, 2021

@dcodeIO your comment wasn't marked as spam because people disagree with you, it was marked as spam because it was not constructive and you needlessly tagged a large number of people.

Engaging in hyperbole isn't constructive either.

Please constrain your comments to technical discussion of the matter at hand. @annevk has been engaging with you in good faith and you are not doing him, or anyone else, the courtesy of the same. At this point I also suggest you read the W3C Code of Ethics and Professional Conduct.

@dcodeIO
Copy link

dcodeIO commented Jul 24, 2021

I do not see how I am violating the CEPC. I disagree technically of course, strongly even, but that doesn't imply hostility on my end. I just think this is extremely important, and I'd rather question the reactions I have received on such an important matter.

@torgo torgo added this to the 2021-10-04-week milestone Oct 3, 2021
@torgo torgo modified the milestones: 2023-05-01-week, 2024-06-03-week Jun 2, 2024
rhiaro added a commit that referenced this issue Dec 4, 2024
torgo added a commit that referenced this issue Dec 4, 2024
* Only use UTF-8 for #322

* Clearer words

Co-authored-by: Daniel Appelquist <[email protected]>

---------

Co-authored-by: Daniel Appelquist <[email protected]>
@jyasskin
Copy link
Contributor

jyasskin commented Dec 4, 2024

I think this is fixed by #524.

@jyasskin jyasskin closed this as completed Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants