-
Notifications
You must be signed in to change notification settings - Fork 77
Discussion: Capitalize the header names of outgoing headers. #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I strongly recommend making it possible to emit headers with any capitalisation, while making the current behaviour the default. The reality is that HTTP implementations that require title cased headers are wrong. Their inability to tolerate them, forcing other implementations to psychically tolerate their preferences, is a bad behaviour and should be punished. However, it should clearly be possible to overrule that behaviour if needed to make an application work. Note that the "capitalise as a matter of course" servers and clients are on the way out. HTTP/2 forcibly lowercases header names, which means a lot of these bad implementations get fixed as they get support for HTTP/2. So this may be a problem that reduces in scope over time. A final note: RFC 2616 is dead. Long live RFCs 723[0-3]! |
ITYM 723[0-5] |
I doubt there's any real performance difference either way between titlecasing and lowercasing, at least at the individual string level:
The difference here is far smaller than the overhead of any caching scheme we could come up with. In general I'd strongly prefer not to have a configuration knob here if at all possible, due to the costs to devs and users of maintaining, testing, documenting the thing... From some research, it looks like title-casing does not solve all these problems. Searching finds a number of examples of implementations that require headers cased like dppPersonUUID, Sec-WebSocket-Key (notice the S in WebSocket), WWW-Authenticate (with Www-Authenticate being rejected), PageSpeed, X-Parse-OS-Version, ... OTOH, it may solve enough of them that we can "get away with it". This appears to be golang's approach. (I'm not 100% sure that title-casing works better than lowercasing on average, but it seems likely. AFAICT nodejs switched from lowercase to title-casing.) Some options:
|
@Lukasa I totally agree with your point of view. I also don't want to be tolerant to these broken implementations. Maybe, as time goes by, these implementations will die out. However, for now, these implementations still have some market share. @njsmith Tornado has a cache for capitalized headers. I wrote a simple benchmark. import time
import tornado.httputil
start_time = time.time()
for _ in range(0, 100000):
"content-length".title()
print("{:.5f}".format(time.time() - start_time))
normalized_headers = tornado.httputil._normalized_headers
start_time = time.time()
for _ in range(0, 100000): # Thanks to @njsmith for stating out this error.
normalized_headers["content-length"]
print("{:.5f}".format(time.time() - start_time)) When running the code above, my computer outputs I also suggest to keep using lowercase internally in favour of HTTP/2, however, add a mechanism at the output to justify the header names. |
Also for the special cases of capitalization in http headers(e.g.: |
Your benchmark has a typo -- you're only running the tornado test 2 times, instead of 100,000 times. After fixing that I get 0.02480 and 0.03074. I'm a bit confused at how your system is apparently >10x slower than my 2 year old laptop, but the relative speeds make more sense this way :-) |
@njsmith sorry, that's my fault. After fixing this problem, the result comes to In addition, the slowness comes from the console output, cause I first ran it in the Python interactive command line without suppressing the output(HOW LAZY I AM!). The result above is run after I fix this two problems. However, my results contradicts yours. After I ran it for multiple times, I still cannot get your results. My platform: |
My numbers on your benchmark must have been a one-off glitch -- running it again now I get numbers more similar to yours. However, the reason I didn't try repeatedly is that I'm still right ;-). Which is to say, it made sense to me, because the numbers matched what I was getting with
It's hard to compare against |
@futursolo: also just to be clear... is this something you're actually running into now, or more of a theoretical concern? |
Actually, I am writing a toy server using h11 and asyncio. Just curious about the design of the implementation, no offence. |
No offense taken! Just trying to figure out how urgent this is :-) |
So, here's an alternative for you @njsmith: write a data structure that is basically a tuple, but allows you to access headers either case sensitively or case insensitively (where case insensitively is the default and is based on a call to The goal here is to allow all of h11's internals to continue to operate with confidence knowing that they'll see headers regardless of their case, but it also allows h11 to preserve any case that is provided to it for the sake of users who want that information. Requests does a similar thing with its |
Here's a really simple (maybe too simple) idea to throw into the mix: use tuples of |
We are looking at implementing @mitmproxy's proxy core using sans-io and h11 would be a natural fit for that. I did some prototyping to test how h11 would work with our planned architecture. I'm really happy with how that turned out, yet header normalization/capitalization would definitely be blocker for us: for forensic reasons, we need to replay every non-malformed request as closely as possible to the upstream server. It would be fantastic if h11 could end up with an implementation that follows @Lukasa's last proposal or @njsmith's |
For the mitmproxy case, see also #47 |
Update: @bradwood is our lucky winner in the non-compliant-header-processing lottery! Well... not so lucky, really. But he's trying to speak websocket to a server that's built-in to his Sky television box, and it's not working. He's using trio-websocket, which uses wsproto, which uses h11. And after some debugging, it turns out that the problem is that the television box's server only recognizes Interestingly, it does treat the While tracking this down, I found this interesting code/comment in the Go websocket library that Bradley was using for testing: It looks like that code+comment was added in gorilla/websocket@5ed2f45, which also switched from writing out the request manually to using the net/http library. The old code that wrote the headers was: So... it seems possible that they've only encountered server's like @bradwood's, that are picky about It looks like the WHATWG fetch spec has switched to only presenting lowercase to JS code, but internally preserves whatever case was passed in and uses it on the wire: whatwg/fetch#476 Cross-references: python-trio/trio-websocket#72 , https://gitter.im/python-trio/general?at=5bd767733844923661ba27d9 |
Interestingly the websocket RFC itself consistently uses |
@bradwood While I'm thinking, here's a hacky (and untested) monkeypatch that should hopefully tide you over: https://gist.github.com/njsmith/2bd3e54649379ecadf8a699c7bb8bcb3 So here's what I'm thinking so far. The simplest approach is to title-case outgoing headers, like Go does (when people don't do hacks like that code in gorilla/websocket). That's easy to do without touching h11's public API at all, and it handles every case that's been reported so far. But it might not handle some cases in the future. We could provide some way for people to manually request case-preservation for a specific header, e.g., pass in a custom So, how hard would it be to do case preservation? When we ingest headers, we normalize the case (in Another option would be to not normalize case in headers passed to us by the user. If we're doing this, then we'd have to make sure that every time we access the headers, we do the normalization on the fly. This makes me wince because it'd be easy to mess this up, but since h11 is a small library with a bounded scope it's probably possible. Specifically it looks like the places internally where we look at header names are: Some timings: Some timing measurements, not sure how meaningful they areJust running Current master:
Creating a dict mapping lowercase name → original name inside
With preserving case of user-specified headers, and doing lowercasing on headers read off the wire, and whenever looking at user-specified headers (but I think this was missing a call to
Unconditional title-casing when writing to the wire:
Timings aren't the only important thing, but they are important to keep an eye on, since people (wisely or not) often make decisions about HTTP servers based on requests/sec microbenchmarks. |
The latter approach seems like a no brainer then unless my uninitiated mind is missing something. I assume you are just calling It looks like the fastest, least intrusive, most isolated and easiest to maintain approach. A small code change that can easily re refactored away down the line if needed |
Although breaks @mhils use case... In which case, maybe a switch to disable header munging? |
@mhils's use case is already broken by the existing header munging (see #47 for details), and I'm pretty confident that we don't want to let users see the original case on incoming headers regardless of what we do here (because there's literally nothing you can use that information for except to violate the spec, probably accidentally), and that also breaks @mhils's use case. Also, a lot of what makes this tricky is that it's actually difficult for h11 to disable header munging, because h11 internally relies on the header munging for correct operation. So it's probably possible to add a switch, but it's not really any easier than any of the other solutions :-). I suspect that eventually we will get pressured to support arbitrary case on outgoing headers in some way or another. But given that I still haven't thought of a simple way to do that, and the title-casing hack is so easy, I guess we should go ahead and do the title-casing thing for now. Tomorrow can worry about tomorrow :-). The one concern that gives me pause is that if we do title-casing now, and then later switch to case-preserving, then in theory that could cause some compatibility issues, for code that passes in lowercase headers to h11 and is talking to a peer that requires title-casing – that works so long as h11 is fixing up the title-casing, but would break when h11 switched to case-preserving. However, code like this is presumably super rare, given that (a) 2.5 years into h11's lifespan we've only found 1 device that insists on title-casing, (b) the code that interacts with that device is actually passing in correctly-cased headers anyway. |
Oh and I'm currently a bit distracted by other things, but if someone wants to make a PR for title-casing outgoing headers that would be cool. |
Copied from my own comment at traefik/traefik#466 (comment) :
There is literally no reason for modification of this behavior to be resisted by this team. |
@cmullendore Hey, we appreciate contributions and feedback, but coming in and making your first post a bunch of shouting isn't cool. Please dial it down and read/listen before responding.
This is the entire point of h11 though. There are infinitely many ways you could potentially deviate from the HTTP spec, and it's impossible for an HTTP library to implement all of them, because it's, y'know, an HTTP library. And if you need some arbitrary non-HTTP functionality, there's nothing stopping you from opening a TCP socket and doing whatever you want. That said, we do try to strike a balance between RFC conformance and real-world interoperability, and there do seem to be a lot of folks out there who are stuck working with broken servers they can't change, so we would like to provide some solution for them.
Well, no. If you read the thread you're posted in, you'll see that the reason the decision is tricky is because preserving case is both riskier and slower than normalizing case. If we normalize, we deal with header case once up front; if we don't normalize, then we have to deal with header case repeatedly every time we touch the headers. |
I appreciate the reply. I might suggest that putting a list of concepts together in numbered bullet points isn't really yelling... but... 🤷♂️ My thing is, we're really not talking about arbitrary functionality... we're talking about something that's really as simple as manipulating a dictionary... which is really, fundamentally, all that the headers object is. I think most people would say that calling .Add(obj) on a collection and having obj be modified as a part of that add would in any other situation categorized as at best unexpected and at worst unacceptable behavior... which my impression after reading the thread is pretty much the external viewpoint.
Providing people with options is totally a great thing... presuming that 1) they are actively choosing to utilize those options and 2) the options meet their expectations. The problem with the above quote is that it presumes that everyone should want #1 so having the choice doesn't matter and the solution implemented (which no one can opt out of because of #1) is always going to be the right one. Neither of these two things are true... and the simple fact that this thread exists here and in many other places is a testament to that.
I apologize but I don't get this because what people are literally asking for is that you do nothing. Though the implementation may vary (string copy, multiple reference, whatever), one would assume that not manipulating a string is easier than manipulating it.
If we go back to the idea that everything should be case insensitive, which is the perspective being given... everyone else should be case insensitive and so manipulating the case shouldn't matter... then why is the requirement for case sensitivity... that is... the need to deal with it at all... such a requirement internally? If it should matter so little, why does it matter so much? I'll admit maybe I was a bit abrupt... dropping in and just throwing down bullet points... but the perspective being given doesn't seem to align with user expectations/needs/requests and for some reason those expectations/needs/requests are being dismissed. For kicks I glanced through the history of header.go (https://github.com/golang/go/commits/master/src/net/http/header.go) and I don't see anything that mentions why this was a good idea? |
Currently, h11 checks
Well, should be. The real world is way less than perfect(Working the way it should be). So to increase compatibility for those incorrect implementations, we kind of have no choice but to capitalise them, which is the reason why I opened this issue in the first place. |
BTW, here's another idea for how to approach this:
from typing import List, Tuple, cast
class UnnormalizedHeaderName:
def lower(self) -> bytes:
pass
HeadersType = List[Tuple[UnnormalizedHeaderName, bytes]]
def get_header(headers: HeadersType, needle: bytes):
for name, value in headers:
# mypy issues an error here:
if name == needle:
return value
# but mypy says this is fine:
if name.lower() == needle:
return value
headers = [
(cast(UnnormalizedHeaderName, b"My-Header"), b"my_header_value")
]
print(get_header(headers, b"my-header")) Of course, to do this, we'd first have to get mypy running on h11. I know @pgjones wants to do this anyway... from a quick look, it wouldn't be too hard to get minimal stuff working. Currently the errors we get are:
Almost all of these are due to py2/py3 version differences, which can be solved by replacing our current |
@njsmith perhaps you've already thought of this and I'm just providing an option that you've ruled out, but urllib3/Requests use a Case Insensitive Dictionary implementation that allows you to access a key with whatever casing you please. It's probably no more performant than what you propose, but it might make reasoning about all of this a tiny bit simpler. |
@sigmavirus24 yeah, I am aware of that option, but there are a few things that make it an awkward fit for h11. First, h11 is pretty low-level, so it currently represents headers as a simple flat-list-of-tuples. This is part of the public API (!), and also turns out to be pretty natural for the kinds of operations that h11 needs to do. Plus there isn't really a compelling widely-agreed-on case-preserving-case-insensitive-multidict implementation that's an obvious choice to make part of our public API. And, because h11 is so low-level, pretty much any cleverness like "using custom data structures" ends up adding a ton of overhead, in percentage terms. Which wouldn't be an issue for urllib3/requests, but people like to choose servers based on stupid reasons like 10% differences in requests/second in microbenchmarks. And h11 is competing with libraries that cut corners on correctness or are written in memory-unsafe languages like C. So I think it is worth jumping through some hoops to try to avoid giving people excuses not to switch. So that's why I don't think that approach is a good fit for h11. |
It looks like someone posted here and then their comment was somehow deleted? Not sure what happened exactly. Anyway, they said that they needed to preserve case in headers because there are Chinese companies that do signature verification on headers. Of course the proper way to do signature verification on headers is to normalize the case before computing the signature. That's what AWS, for example, does when making signed requests. But I can believe that there might be folks out there who get this wrong. Though it's also possible that they do normalize case correctly, but this poster missed that detail. (It's an easy detail to miss; I bet the vast majority of AWS users don't realize that the signatures use normalized headers.) If anyone knows any more about this, I would love to know:
|
Another data point of a real-world service not treating headers case-insensitively: encode/httpx#538 (comment) |
Another data point: encode/httpx#728
My feeling here would be that even if it's only "needed" on outgoing messages, there's probably? a UX argument in favour of case-preserving in both directions. (Or at least, I think that'd be helpful for HTTP clients, where I'd rather any debugging or technical output showed the literal bytewise headers, as they were sent by the server.) Presumably an implementation would need to preserve the existing behavior by default. Would an implementation that added |
Preserving the raw headers on incoming messages is tricky. I hear you on case being nice to have in debug output, but on the other hand... preserving non-meaningful case is also the root cause of all these bugs where people accidentally write case-sensitive programs. One thing I've wondered is whether we should explicitly save the raw headers separately as a dedicated debugging feature (e.g. as a raw byte string, so you can see all the whitespace, header case, etc.). There's also some discussion in #92 of doing more normalization of header values, which would be another case where separate debug output might be useful. |
I now got the same problem as @bradwood - trying to connect from
UPDATE:
so RFC 6455 violates RFC 2616 :( |
Although, as per RFC 2616, All the HTTP/1.1 headers should be case-insensitive. However, as a general practice, most of the major web servers have their headers capitalized. Also, some broken HTTP/1.1 implementations do not recognize some specific headers(e.g.: Server header for Apache Benchmark) if they are not properly capitalized. It would be better if the header can be capitalized.
Performance Concerns: Compared to
str.lower()
, str.title() is slower. However, this can be migrated by adding an LRU Cache into the header capitalization process to solve the problem.The text was updated successfully, but these errors were encountered: