Skip to content

[idna] Update data to Unicode 10.0 and fix logic #351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 21, 2017

Conversation

behnam
Copy link
Contributor

@behnam behnam commented May 29, 2017

Fix #166


This change is Reviewable

@behnam
Copy link
Contributor Author

behnam commented May 29, 2017

Filed unicode-rs/unicode-normalization#16 for is_combining_mark() failure for U+11C3A.

@behnam behnam force-pushed the ucd10 branch 5 times, most recently from a3ea472 to a48ab35 Compare May 29, 2017 22:29
@behnam behnam requested a review from SimonSapin May 29, 2017 22:47
@behnam behnam self-assigned this May 29, 2017
@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #360) made this pull request unmergeable. Please resolve the merge conflicts.

@behnam
Copy link
Contributor Author

behnam commented Jun 16, 2017

Now that unicode-normalization is fixed, I dropped the workaround the workaround in latest update: https://github.com/behnam/rust-url/tree/ucd10

@behnam behnam force-pushed the ucd10 branch 3 times, most recently from 9a9ee52 to d0eb5bf Compare June 16, 2017 20:48
@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #364) made this pull request unmergeable. Please resolve the merge conflicts.

@SimonSapin
Copy link
Member

Sorry for the delays and repeated conflicts. If you’d prefer I can take over and do the minor changes I requested below.


Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 2 of 2 files at r4, 2 of 2 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


Cargo.toml, line 5 at r5 (raw file):

name = "url"
# When updating version, also modify html_root_url in the lib.rs
version = "1.5.1"

This change is not necessary, please remove it. Since the new version of idna is semver-compatible, end-users can update to it independently of url.


Cargo.toml, line 45 at r5 (raw file):

encoding = {version = "0.2", optional = true}
heapsize = {version = ">=0.1.1, <0.5", optional = true}
idna = { version = "0.1.0", path = "./idna" }

This is also unnecessary. "0.1.0" means ">=0.1.0,<0.2.0", same as "0.1".


idna/src/uts46.rs, line 416 at r4 (raw file):

#[cfg(test)]
mod tests {

Cargo.toml specifies [lib] test = false to avoid compiling the library code twice, so these tests are never run. Please move them to a new idna/tests/unit.rs file and add a [[test]] name = "unit" section in Cargo.toml.


Comments from Reviewable

behnam added 4 commits June 20, 2017 11:14
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
* The code was disabled to allow tests pass. Now that `IdnaTest.txt` is
fixed for this failure, we can re-enable the code.
* As the first paragraph of The Bidi Rules section explains, the rules
need to be ignored if there are no Bidi labels present in the domain
name. So, add `is_bidi_domain` evaluation to `processing()`, and pass it
down to `passes_bidi()` to act on.

* Add unit tests for the bidi rules, making it faster and easier to
maintain the feature.
@behnam
Copy link
Contributor Author

behnam commented Jun 20, 2017

Thanks for the review, @SimonSapin. I've addressed all the comments and rebased, so should be good to land.


Review status: 1 of 5 files reviewed at latest revision, 3 unresolved discussions.


idna/src/uts46.rs, line 416 at r4 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Cargo.toml specifies [lib] test = false to avoid compiling the library code twice, so these tests are never run. Please move them to a new idna/tests/unit.rs file and add a [[test]] name = "unit" section in Cargo.toml.

Done.


Cargo.toml, line 5 at r5 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

This change is not necessary, please remove it. Since the new version of idna is semver-compatible, end-users can update to it independently of url.

Done.


Cargo.toml, line 45 at r5 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

This is also unnecessary. "0.1.0" means ">=0.1.0,<0.2.0", same as "0.1".

Done.


Comments from Reviewable

@SimonSapin
Copy link
Member

Reviewed 1 of 2 files at r7, 2 of 3 files at r10, 1 of 1 files at r11.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


idna/src/uts46.rs, line 346 at r10 (raw file):

#[derive(PartialEq, Eq, Clone, Copy, Debug)]
pub enum Error {

Can we not make this public? That would mean that adding a new variant (like you did in another commit of this PR) would be a breaking change. It looks like unit tests don’t need it.


Comments from Reviewable

@behnam
Copy link
Contributor Author

behnam commented Jun 21, 2017

Review status: 3 of 5 files reviewed at latest revision, 1 unresolved discussion.


idna/src/uts46.rs, line 346 at r10 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Can we not make this public? That would mean that adding a new variant (like you did in another commit of this PR) would be a breaking change. It looks like unit tests don’t need it.

Right, there was no need for it anymore. Reverted it.


Comments from Reviewable

@behnam
Copy link
Contributor Author

behnam commented Jun 21, 2017

Also updated the IdnaTest file to final release, which came out yesterday. So, there's nothing left from the draft versions and we're all up to date.

@behnam
Copy link
Contributor Author

behnam commented Jun 21, 2017

Btw, updating uts46_mapping_table.rs to Unicode 10.0.0 should be done in a separate diff. Let me know if you like me to submit a PR.

@SimonSapin
Copy link
Member

Looks great, thanks!

@bors-servo r+

Btw, updating uts46_mapping_table.rs to Unicode 10.0.0 should be done in a separate diff. Let me know if you like me to submit a PR.

Please do. (I’m a bit surprise the new tests pass without the new data. Maybe the tests don’t cover the normative differences.)


Reviewed 1 of 4 files at r6, 1 of 2 files at r12, 1 of 1 files at r13, 1 of 1 files at r14.
Review status: all files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@bors-servo
Copy link
Contributor

📌 Commit 3c7da07 has been approved by SimonSapin

@bors-servo
Copy link
Contributor

⌛ Testing commit 3c7da07 with merge 40aa05a...

bors-servo pushed a commit that referenced this pull request Jun 21, 2017
[idna] Update data to Unicode 10.0 and fix logic

* Change the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in #166. (Another attempt on #337 and #171)

* Update `IdnaTest.txt` file to UCD 10.0 and fix Validation Rules, specially Bidi Rules, for the tests to pass.

* Add TODO marks for new flags introduced in Unicode 10.0 version of UTS#46. (http://www.unicode.org/reports/tr46/proposed.html)

* Add integration test for `rust-url` crate for the new behavior.

Fix #166

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/351)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

☀️ Test successful - status-travis
Approved by: SimonSapin
Pushing 40aa05a to master...

@bors-servo bors-servo merged commit 3c7da07 into servo:master Jun 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic when parsing a . in file URLs
3 participants