-
Notifications
You must be signed in to change notification settings - Fork 347
[idna] Update data to Unicode 10.0 and fix logic #351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Filed unicode-rs/unicode-normalization#16 for |
a3ea472
to
a48ab35
Compare
☔ The latest upstream changes (presumably #360) made this pull request unmergeable. Please resolve the merge conflicts. |
Now that |
9a9ee52
to
d0eb5bf
Compare
☔ The latest upstream changes (presumably #364) made this pull request unmergeable. Please resolve the merge conflicts. |
Sorry for the delays and repeated conflicts. If you’d prefer I can take over and do the minor changes I requested below. Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 2 of 2 files at r4, 2 of 2 files at r5. Cargo.toml, line 5 at r5 (raw file):
This change is not necessary, please remove it. Since the new version of idna is semver-compatible, end-users can update to it independently of url. Cargo.toml, line 45 at r5 (raw file):
This is also unnecessary. "0.1.0" means ">=0.1.0,<0.2.0", same as "0.1". idna/src/uts46.rs, line 416 at r4 (raw file):
Comments from Reviewable |
A retry of servo#171 This diff changes the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in servo#166. The change in the code results in a few failures for test cases of the Conformance Testing data provided with UTS servo#46. But, as the header of the test data file (IdnaTest.txt) says: "If the file does not indicate an error, then the implementation must either have an error, or must have a matching result." Therefore, failing on those test cases does not break conformance with UTS servo#46, and to some level, anticipated. As mentioned in servo#166, a feedback is submitted for this inconsistency and the test logic can be improved later if the data file addresses the comments. Until then, we can throw less errors and maintain passing conformance tests with this diff. To keep the side-effects of ignoring errors during test runs as minimum as possible, I have separated `TooShortForDns` error from `TooLongForDns`. The `Error` struct has been kept private, so the change won't affect any library users. Fix servo#166
* The code was disabled to allow tests pass. Now that `IdnaTest.txt` is fixed for this failure, we can re-enable the code.
* As the first paragraph of The Bidi Rules section explains, the rules need to be ignored if there are no Bidi labels present in the domain name. So, add `is_bidi_domain` evaluation to `processing()`, and pass it down to `passes_bidi()` to act on. * Add unit tests for the bidi rules, making it faster and easier to maintain the feature.
Thanks for the review, @SimonSapin. I've addressed all the comments and rebased, so should be good to land. Review status: 1 of 5 files reviewed at latest revision, 3 unresolved discussions. idna/src/uts46.rs, line 416 at r4 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Done. Cargo.toml, line 5 at r5 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Done. Cargo.toml, line 45 at r5 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Done. Comments from Reviewable |
Reviewed 1 of 2 files at r7, 2 of 3 files at r10, 1 of 1 files at r11. idna/src/uts46.rs, line 346 at r10 (raw file):
Can we not make this public? That would mean that adding a new variant (like you did in another commit of this PR) would be a breaking change. It looks like unit tests don’t need it. Comments from Reviewable |
Review status: 3 of 5 files reviewed at latest revision, 1 unresolved discussion. idna/src/uts46.rs, line 346 at r10 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Right, there was no need for it anymore. Reverted it. Comments from Reviewable |
Also updated the |
Btw, updating |
Looks great, thanks! @bors-servo r+
Please do. (I’m a bit surprise the new tests pass without the new data. Maybe the tests don’t cover the normative differences.) Reviewed 1 of 4 files at r6, 1 of 2 files at r12, 1 of 1 files at r13, 1 of 1 files at r14. Comments from Reviewable |
📌 Commit 3c7da07 has been approved by |
[idna] Update data to Unicode 10.0 and fix logic * Change the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in #166. (Another attempt on #337 and #171) * Update `IdnaTest.txt` file to UCD 10.0 and fix Validation Rules, specially Bidi Rules, for the tests to pass. * Add TODO marks for new flags introduced in Unicode 10.0 version of UTS#46. (http://www.unicode.org/reports/tr46/proposed.html) * Add integration test for `rust-url` crate for the new behavior. Fix #166 <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/351) <!-- Reviewable:end -->
☀️ Test successful - status-travis |
Change the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in Panic when parsing a
.
in file URLs #166. (Another attempt on [idna] Preserve leading dots in host #337 and Don’t remove leading dots in domain names #171)Update
IdnaTest.txt
file to UCD 10.0 and fix Validation Rules, specially Bidi Rules, for the tests to pass.Add TODO marks for new flags introduced in Unicode 10.0 version of UTS#46. (http://www.unicode.org/reports/tr46/proposed.html)
Add integration test for
rust-url
crate for the new behavior.Fix #166
This change is