-
Notifications
You must be signed in to change notification settings - Fork 54
Encourage UTF-8 for new formats and APIs #322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
That this is used as a justification for what Wasm does in between function calls is just surprising. I mean, networking, file formats, fine, but breaking with JS is worrisome at best. |
Can you explain how any of this breaks with JavaScript? |
I am specifically referring to the "new APIs" part. This design principle has been biting Wasm since December 2017, as it is regularly used as an argument to disallow surrogates on boundaries. Trapping, or replacing, on code that intentionally does not produce errors for backwards-compatibility reasons, in languages that make it so very easy to |
Why is it a huge mistake? If it was a huge mistake wouldn't we have noticed it being a problem in the past decade or so? |
Are you saying that since you didn't notice, exactly because it is currently intentionally allowed so it does not lead to errors, that the opposite of a legitimate argument must be true? |
I don't think so. It's still not entirely clear to me what exactly you find problematic and for what reason. Examples would help. |
let myString = inputString.substring(0, 10); // user finds it funny to place an emoji at 9
map.set(myString, 42);
let alsoMyString = roundtripStringOverInterfaceTypesBoundaryButWhoKnows(myString);
map.get(alsoMyString) // undefined
if (myString == alsoMyString) {
// false
} let myString = getStringFromDatabaseOverInterfaceTypesBoundaryButWhoKnows();
queryWhere("stringInDatabase = ", myString); // no results
updateWhere("stringInDatabase = ", myString); // no update, update wrong row or error
deleteWhere("stringInDatabase = ", myString); // no delete, delete wrong row or error |
That seems rather contrived and will already fail the moment you involve the network or URLs. |
This comment has been minimized.
This comment has been minimized.
Applying this design principle to Wasm is literally breaking the Web Platform, wrecking WebAssembly, damaging the reputation of the W3C and everyone involved, and I do not think that a bunch of thumbs down and marking my comments as spam is very helpful in this regard. This is not only highly security relevant as it may eventually affect any arbitrary amount of code ever written in JavaScript, JavaScript-likes, C# and Java, but in the worst case puts people's lives in serious danger when they rely on correctly functioning software that doesn't kill them just because someone put an emoji in the wrong place. |
@dcodeIO your comment wasn't marked as spam because people disagree with you, it was marked as spam because it was not constructive and you needlessly tagged a large number of people. Engaging in hyperbole isn't constructive either. Please constrain your comments to technical discussion of the matter at hand. @annevk has been engaging with you in good faith and you are not doing him, or anyone else, the courtesy of the same. At this point I also suggest you read the W3C Code of Ethics and Professional Conduct. |
I do not see how I am violating the CEPC. I disagree technically of course, strongly even, but that doesn't imply hostility on my end. I just think this is extremely important, and I'd rather question the reactions I have received on such an important matter. |
* Only use UTF-8 for #322 * Clearer words Co-authored-by: Daniel Appelquist <[email protected]> --------- Co-authored-by: Daniel Appelquist <[email protected]>
I think this is fixed by #524. |
A lot of the new formats and APIs we've been designing (and some not-so-new) assume UTF-8 unconditionally. These include:
importScript()
sfetch()
'stext()
convenience method;XMLHttpRequest
'sresponseText
convenience getter; andBlob
'stext()
convenience methodWe also made it non-conforming for HTML documents to use any other encoding. And, Encoding tries to be clear that everything else is legacy.
It'd be good if this was captured in the design principles doc. https://w3ctag.github.io/design-principles/#new-data-formats is one place, that captures several of the above examples. There might be room for some separate guidance on APIs (not just formats), to capture the
text()
andresponseText
examples: basically, any time an API is interpreting some unknown bytes as a string, it should just assume it's always UTF-8.The text was updated successfully, but these errors were encountered: