-
Notifications
You must be signed in to change notification settings - Fork 13.3k
String hashes vulnerable to algorithmic complexity attacks #1616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Secure hashing and hash table hashing are different problems. This seems a bit of a non-issue. |
This has nothing to do with cryptographic (=secure) hashing. The issue is that simply filling a hash table with user-controlled strings can take hours if the user crafted malicious data (full of hash collisions). If rust is a suitable choice to e.g. write a web server that accepts POST request, it would be undesired behaviour for a reasonably-sized POST request to eat hours of CPU time |
I've read this story, and I agree it's a real issue, and I'll be very happy if we ever reach a point where this turns into a vulnerability in production code. Patches welcome. |
I'd like to write a patch but it seems there is no random number generator yet. Is that correct? |
There's |
Also agreed with marijn. It's a much more pervasive issue than that one hash function. |
Personally I also think this issue is somewhat misguided and can be mitigated by other means. I agree that something more robust than DJB hash should be used, but - and not to try and say this isn't an actual issue - is it not feasible to simply use a different data structure for such associative container, like a rebalancing binary search tree? This is the underlying implementation of The reason I say this is because they have better worst-case properties than a hash table which will degrade to O(n) with specially crafted input. Using a better hash function won't necessarily fix this worst case property (and let's not forget that randomness as a seed in a hash depends on a secure PRNG, which is a topic in and of itself.) Yes, this is a real issue, and it's even affected the linux kernel. Sometimes the worst-case is actually very important as opposed to the average-case. I think it is strange that so many people default to hash tables for such containers. At best, this seems like an issue of people just using whatever the default their language provides them with - or what is familiar - as opposed to what makes sense (many scripting languages use hash tables as the default associative-data-structure as an implementation detail. It seems like "the most natural choice," but I'd argue that's a position of familiarity, not necessarily the reality.) BST's are well understood, simple, and have predictable performance properties, as opposed to hash tables which have complicated performance properties due to both the hash and underlying implementation (and a much worse worst-case.) If you optimize and find out that dealing with the fields in POST data is a bottleneck of horrific proportions, then optimize with a hash table, and a random seed, whatever. Otherwise, use simple things that work and you'll avoid the problem. Again I think everybody would accept a patch here and I'm not trying to say this is not an issue, but I feel that maybe these issues can be attacked in other, simpler ways with more predictable performance, as opposed to just adding randomness and a better hash, and hoping it goes away. |
I agree that maps are better than hashtables. I have, however, been poking at a murmur3 implementation for a while now. I think it is probably still too slow, and it was the first thing I wrote in Rust. When I get around to it again, I'll ask for more optimization help: https://github.com/killerswan/murmur/blob/master/murmur.rs I was also just reading about CityHash... http://code.google.com/p/cityhash/ |
So there's a lot of different points made here, let me re-cap
Fixing this is a matter of a) random seed generated once at each program startup, b) Using ^ instead of +, c) maybe change the 33 to a larger prime (still as easy to compute, e.g. 65537). This should not be slower nor too much complexity.
True. But a) ideally, all hash functions should be fixed and b) strings are the most popular dictionary key type, especially when you write a web service that accepts JSON or similar input.
For huge sizes, the log n average case difference can matter, that can be a valid reason to use a hash table. This is why hash tables should not be vulnerable to collision attacks. Independent of that point, it might be sensible to use maps as default container.
Might be a good idea and would hopefully fix this bug (haven't looked at the actual code of those hash functions). |
I'd be happy to take a patch that randomizes the various hashes used in transient (in-memory) hashtables. Both in the runtime and in the core and standard libraries. We have a CPRNG in the runtime, exposed as |
@graydon Can this be closed now? |
@graydon I'm also curious how we stand on hashing now. You've made a lot of improvements. |
I'll take another pass over |
Thanks for fixing this! |
Hello!
The string hash function from here is vulnerable to an algorithmic complexity attack (from src/libcore/str.rs):
Consider a web service that accepts e.g. a POST request in urlencoded form or JSON and constructs a map<String, String> from the POST data (extremely common case). For this hash function, it is extremely easy for an attacker to create input where all map keys will have the same hash. This allows for e.g. 1MB of POST data to get one CPU core busy for hours.
This is an issue that affects a lot of languages, see http://www.ocert.org/advisories/ocert-2011-003.html
Here a talk by the ppl creating the advisory: http://www.youtube.com/watch?v=R2Cq3CLI6H8
One solution is to use a random seed and to use XOR instead of adding. If this is to be replaced with murmur, make sure that the seed is random (I'm not an expert on murmur).
The text was updated successfully, but these errors were encountered: