Skip to content

net/url: parsing URLs with port > 65536 should fail #69443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dunglas opened this issue Sep 13, 2024 · 11 comments
Closed

net/url: parsing URLs with port > 65536 should fail #69443

dunglas opened this issue Sep 13, 2024 · 11 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@dunglas
Copy link
Contributor

dunglas commented Sep 13, 2024

Go version

go version go1.22.0 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/dunglas/Library/Caches/go-build'
GOENV='/Users/dunglas/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/dunglas/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/dunglas/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/Users/dunglas/go/pkg/mod/golang.org/[email protected]'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='go1.22.0+auto'
GOTOOLDIR='/Users/dunglas/go/pkg/mod/golang.org/[email protected]/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.0'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/nf/_zwcfbtx7m9d5g8jgympqy5r0000gn/T/go-build2393813559=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

url.Parse("https://example.org:70000")

What did you see happen?

Parsing succeeds, and no error is returned.

What did you expect to see?

Per the WHATWG Living Standard (and, if I'm not mistaken, internet RFCs), parsing should fail because the port is out of range: https://url.spec.whatwg.org/#port-out-of-range

dunglas added a commit to dunglas/go that referenced this issue Sep 13, 2024
The existing implementation does not validate
that the port number is in the allowed range.

WHATWG URL Living Standard mandates that
parsing URLs with invalid ports fails:
https://url.spec.whatwg.org/#port-out-of-range

Fixes golang#69443.
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/613035 mentions this issue: net/url: return an error if port is out of range

@timothy-king timothy-king added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 13, 2024
@timothy-king timothy-king added this to the Backlog milestone Sep 13, 2024
@timothy-king
Copy link
Contributor

CC @neild, @rsc.

@ianlancetaylor
Copy link
Contributor

The net/url package tries to follow RFC 3986, which does not impose any limit on the port number. In particular net/url works for schemes other than http, and there is no requirement that it based on TCP. Admittedly in practice it is based on TCP, so this may be splitting hairs. But I'm concerned that adding this kind of check will break existing working code.

Naturally actually attempting to use an HTTP URL with a large port will fail with an error like dial tcp: address 70000: invalid port.

I'm not really opposed to making this kind of change, I just want to raise the opposing viewpoint.

@puellanivis
Copy link

I recall a bit the frustration that occurred when we introduced ports needing to be strictly decimal. There were a few libraries (particularly MySQL’s driver) that broke because their DSNs were using tcp(...) encapsulated authority sections so that they could support TCP and unix(...) for Unix sockets.

@neild
Copy link
Contributor

neild commented Sep 25, 2024

Is there a practical benefit to returning an error when parsing a URL with a large port number? Or harm caused by not returning an error?

I suspect there are tests out there that contain URLs with arbitrary junk ports. Arguably, those tests shouldn't expect a URL containing an invalid port to parse correctly, but if we're going to break them we should have a good motivation to do so.

As @ianlancetaylor says, trying to use a URL with a large port will fail, so I'd expect failing at parse time to mostly just change when an error is encountered, not whether one is.

@dunglas
Copy link
Contributor Author

dunglas commented Sep 25, 2024

URLs are identifiers that don't have to be dereferenceable. In XML, JSON-LD, RDF, etc it's very common to use URLs as namespaces and identifiers, and it's also common (and intended) that it's not possible to fetch these URLs.

Even when the URL is dereferenceable, in these cases, it's usually useless and bad for performance to try to instantiate a network connection just to check that the identifier is well formed.

On the other hand, it's quite common to have to manipulate these URLs: matching some parts with a URL pattern, appending a query parameter, changing the path...

And here comes the interoperability issue: a URL that has been generated or validated using net/url may fail when trying to manipulate it with another programming language that follows more strictly the WHATWG spec, and that's the case of the ubiquitous JavaScript:

// This is a valid URL according to "net/url", but not according to the WHATWG spec
new URL('https://example.com:70000/ns/my-vocab')
// Uncaught TypeError: URL constructor: https://example.org:70000 is not a valid URL.

For my use case, I managed to work around the issue by using https://github.com/nlnwa/whatwg-url, which strictly respects the WHATWG spec.
There is also https://github.com/ada-url/goada, but that requires cgo.

While it's possible to use third-party libraries like these to ensure that the URL is valid according to the WHATWG URL living standard, it would be more practical if the standard library also implemented it.

I also fully understand your point of view and the risk posed by stricter rules. The bottom line is: do we want to respect the WHATWG spec in net/url or not?

For the record, there is a discussion about inconsistencies between RFCs and the WHATWG spec going on on the HTTP working group mailing list that may be interesting (opinions are divided on the relevance of strictly implementing the WHATWG spec): https://lists.w3.org/Archives/Public/ietf-http-wg/2024JulSep/0281.html

@neild
Copy link
Contributor

neild commented Oct 7, 2024

I think this message from that ietf-http-wg@ thread has a useful point: https://lists.w3.org/Archives/Public/ietf-http-wg/2024JulSep/0304.html

The problem with the status quo is that the WhatWG's standard is a living
document that continually is updated. It is kind of an aspirational
guideline of what the browsers are currently striving to implement. But
adherence to WhatWG standards is often somewhat variable (see David's post
above). Thus rather than implementing that specification, you end up
having to do a survey of common browsers and making judgement calls about
which to support, and then keep a live lookout for any changes in those
implementations

Should net/url attempt to follow "aspirational guidance" in a "living standard", changing the definition of what parses between releases? That seems like a problem, especially since as @ianlancetaylor points out, net/url works for schemes other than HTTP.

Following RFC 3986 seems like a principled approach. I'm not convinced that chasing a moving target is the right approach for a standard library parser.

@dunglas
Copy link
Contributor Author

dunglas commented Oct 7, 2024

Although I don't like the concept of “living standard” either, as the list discussion shows, WHATWG standards are (unfortunately) the de facto standards and they are here to stay.

Also, WHATWG standards don't change that much, and they change less and less as they become mature: https://whatwg.org/faq#living-standard

The URL Living Standard is quite mature and doesn't move a lot (most changes are editorial, and the real changes are mostly to fix bugs and interoperability issues): https://github.com/whatwg/url/commits

net/url works for schemes other than HTTP."

The WHATWG standard isn't only for HTTP, it supports all schemes and documents the special rules that apply to some of them: https://url.spec.whatwg.org/#special-scheme

@neild
Copy link
Contributor

neild commented Oct 7, 2024

The WHATWG URL standard might not change a lot, but we're proposing breaking working code in response to it disaligning from RFC 3986. I would be very surprised if there aren't tests out there with invalid port numbers in them which will be broken by this change.

The argument in favor of making this change is parser alignment: The hope is presumably that net/url will never accept a URL that a JavaScript parser rejects, and presumably vice-versa. However, if the existing parser landscape is inconsistent and the standard changes over time, then we're never going to achieve parser alignment; you could make an argument for alignment being possible with a sufficiently nailed-down specification, but not with a "living standard".

I'm not seeing the benefit here being worth the downside of probably breaking real tests.

If someone can convincingly argue that this change won't break any existing code, I'd be supportive of it, but I think that's a hard case to make.

@dunglas
Copy link
Contributor Author

dunglas commented Oct 7, 2024

That's indeed almost impossible :)
Let's close as there is another pure-go library out there implementing the spec anyway.

@dunglas dunglas closed this as not planned Won't fix, can't repro, duplicate, stale Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants