-
Notifications
You must be signed in to change notification settings - Fork 18k
net/url: parsing URLs with port > 65536 should fail #69443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The existing implementation does not validate that the port number is in the allowed range. WHATWG URL Living Standard mandates that parsing URLs with invalid ports fails: https://url.spec.whatwg.org/#port-out-of-range Fixes golang#69443.
Change https://go.dev/cl/613035 mentions this issue: |
The net/url package tries to follow RFC 3986, which does not impose any limit on the port number. In particular net/url works for schemes other than http, and there is no requirement that it based on TCP. Admittedly in practice it is based on TCP, so this may be splitting hairs. But I'm concerned that adding this kind of check will break existing working code. Naturally actually attempting to use an HTTP URL with a large port will fail with an error like I'm not really opposed to making this kind of change, I just want to raise the opposing viewpoint. |
I recall a bit the frustration that occurred when we introduced ports needing to be strictly decimal. There were a few libraries (particularly MySQL’s driver) that broke because their DSNs were using |
Is there a practical benefit to returning an error when parsing a URL with a large port number? Or harm caused by not returning an error? I suspect there are tests out there that contain URLs with arbitrary junk ports. Arguably, those tests shouldn't expect a URL containing an invalid port to parse correctly, but if we're going to break them we should have a good motivation to do so. As @ianlancetaylor says, trying to use a URL with a large port will fail, so I'd expect failing at parse time to mostly just change when an error is encountered, not whether one is. |
URLs are identifiers that don't have to be dereferenceable. In XML, JSON-LD, RDF, etc it's very common to use URLs as namespaces and identifiers, and it's also common (and intended) that it's not possible to fetch these URLs. Even when the URL is dereferenceable, in these cases, it's usually useless and bad for performance to try to instantiate a network connection just to check that the identifier is well formed. On the other hand, it's quite common to have to manipulate these URLs: matching some parts with a URL pattern, appending a query parameter, changing the path... And here comes the interoperability issue: a URL that has been generated or validated using // This is a valid URL according to "net/url", but not according to the WHATWG spec
new URL('https://example.com:70000/ns/my-vocab')
// Uncaught TypeError: URL constructor: https://example.org:70000 is not a valid URL. For my use case, I managed to work around the issue by using https://github.com/nlnwa/whatwg-url, which strictly respects the WHATWG spec. While it's possible to use third-party libraries like these to ensure that the URL is valid according to the WHATWG URL living standard, it would be more practical if the standard library also implemented it. I also fully understand your point of view and the risk posed by stricter rules. The bottom line is: do we want to respect the WHATWG spec in For the record, there is a discussion about inconsistencies between RFCs and the WHATWG spec going on on the HTTP working group mailing list that may be interesting (opinions are divided on the relevance of strictly implementing the WHATWG spec): https://lists.w3.org/Archives/Public/ietf-http-wg/2024JulSep/0281.html |
I think this message from that ietf-http-wg@ thread has a useful point: https://lists.w3.org/Archives/Public/ietf-http-wg/2024JulSep/0304.html
Should net/url attempt to follow "aspirational guidance" in a "living standard", changing the definition of what parses between releases? That seems like a problem, especially since as @ianlancetaylor points out, net/url works for schemes other than HTTP. Following RFC 3986 seems like a principled approach. I'm not convinced that chasing a moving target is the right approach for a standard library parser. |
Although I don't like the concept of “living standard” either, as the list discussion shows, WHATWG standards are (unfortunately) the de facto standards and they are here to stay. Also, WHATWG standards don't change that much, and they change less and less as they become mature: https://whatwg.org/faq#living-standard The URL Living Standard is quite mature and doesn't move a lot (most changes are editorial, and the real changes are mostly to fix bugs and interoperability issues): https://github.com/whatwg/url/commits
The WHATWG standard isn't only for HTTP, it supports all schemes and documents the special rules that apply to some of them: https://url.spec.whatwg.org/#special-scheme |
The WHATWG URL standard might not change a lot, but we're proposing breaking working code in response to it disaligning from RFC 3986. I would be very surprised if there aren't tests out there with invalid port numbers in them which will be broken by this change. The argument in favor of making this change is parser alignment: The hope is presumably that net/url will never accept a URL that a JavaScript parser rejects, and presumably vice-versa. However, if the existing parser landscape is inconsistent and the standard changes over time, then we're never going to achieve parser alignment; you could make an argument for alignment being possible with a sufficiently nailed-down specification, but not with a "living standard". I'm not seeing the benefit here being worth the downside of probably breaking real tests. If someone can convincingly argue that this change won't break any existing code, I'd be supportive of it, but I think that's a hard case to make. |
That's indeed almost impossible :) |
Go version
go version go1.22.0 darwin/arm64
Output of
go env
in your module/workspace:What did you do?
url.Parse("https://example.org:70000")
What did you see happen?
Parsing succeeds, and no error is returned.
What did you expect to see?
Per the WHATWG Living Standard (and, if I'm not mistaken, internet RFCs), parsing should fail because the port is out of range: https://url.spec.whatwg.org/#port-out-of-range
The text was updated successfully, but these errors were encountered: