-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[RFC] switch from axTLS to BearSSL #3490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @igrr thanks for the heads up. Also BearSSL is not on github but on a proprietary git server this will be hard to maintain the upstream like axtls is now because it is using svn... |
I'm probably not the heaviest user, but the only real instability I've seen with axTLS has been when the heap runs out or gets fragmented. Unless one of those has a smaller heap footprint or a real, static allocation of about the same size as axTLS, then I don't think it's going to move the needle. I don't think BearSSL being on a self-hosted GIT is really an issue other than for submitting pull requests. They do have a gitweb which seems to be up-to-date, so you can even trawl through code online. If I had to guess, I'd say mbedTLS has a better chance of fitting in the little RAM we've got on chip. Looking through their code and readmes I see configuration options and suggestions for small-memory systems. And it's now owned by Softbank/ARM so there's that pedigree. BearSSL seems to be worried about stopping denial-of-service by allocating a fixed amount of memory, so you can't fill up RAM by getting dozens of connections. That doesn't necessarily mean the fixed allocation is particularly small, just constant. But I don't see any actual memory statistics so can't really know. mbedTLS's README talks about there being a 16K TLS buffer required by the spec. If you need one for xmit and one for receive, then you're out of luck anyway on the ESP8266 since only ~40KB is free to begin with before any connections... |
The current axTLS seems unable to support 2-way secure MQTT traffic, so whatever is needed to fix that would be very welcome here.
… On Aug 2, 2017, at 8:12 AM, Earle F. Philhower, III ***@***.***> wrote:
I'm probably not the heaviest user, but the only real instability I've seen with axTLS has been when the heap runs out or gets fragmented. Unless one of those has a smaller heap footprint or a real, static allocation of about the same size as axTLS, then I don't think it's going to move the needle.
I don't think BearSSL being on a self-hosted GIT is really an issue other than for submitting pull requests. They do have a gitweb which seems to be up-to-date, so you can even trawl through code online.
If I had to guess, I'd say mbedTLS has a better chance of fitting in the little RAM we've got on chip. Looking through their code and readmes I see configuration options and suggestions for small-memory systems. And it's now owned by Softbank/ARM so there's that pedigree.
BearSSL seems to be worried about stopping denial-of-service by allocating a fixed amount of memory, so you can't fill up RAM by getting dozens of connections. That doesn't necessarily mean the fixed allocation is particularly small, just constant.
But I don't see any actual memory statistics so can't really know. mbedTLS's README talks about there being a 16K TLS buffer required by the spec. If you need one for xmit and one for receive, then you're out of luck anyway on the ESP8266 since only ~40KB is free to begin with before any connections...
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#3490 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKy2zqAtfQrzUFugr5HSngEkxd3A9YZ5ks5sUJHMgaJpZM4OqzlH>.
|
I think for full duplex you will need 2x16kb buffers because of max fragment size of ssl so i don.t think full duplex is doable with any system. I would directly alloc 16kb for axtls and that.s it one reusable connection no more fragmentation... |
Some extra information:
All together, a minimal client that uses a half-duplex mode and talks to a server whose public-key is already known, and that understands the Max Fragment Length extension, should work with as little as 837 + 3348 = 4185 bytes of RAM, and a 4 kB stack (for transient allocation). A minimal server, that can accept connections from big client (who will not send the Max Fragment Length extension), will need 16709 + 3684 = 20393 bytes of RAM, again in half-duplex mode. Right now, BearSSL is declared "beta", which means that it has passed through some extensive testing, and thus should have few remaining bugs. This also means that the features I'd like to add before version 1.0 (the "stable" version) should not require any breaking API change. Code is not on GitHub so that the legal status is simpler (it's developed in Canada and distributed from Canada). The Git repository is on a dedicated VM that I rent (from OVH) for that exact purpose. The gitweb feeds dynamically on the repository, so by construction it is always up-to-date. As for pull requests, in any case, I am a complete maniac and I would never merge external code "as is": suggestions and patches are welcome (and I already receive some) but I will read them through and mostly rewrite them completely, because I want to be able to say: "I fully know and understand every single line of code in BearSSL". |
@pornin , Thanks for the detailed info (and your concern for getting something as important as SSL right in a 1.0 version)! Seems like OpenSSL has been slow with RFC6066 (at least from the pull requests..I think they're at month 13 with people still asking for tests to be added), so while it sounds great I think it will break most things on the internet for now. But the suggestion on using a smaller xmit buffer, "... a 16 kB input buffer (16709 bytes, exactly) and a 2 kB output buffer, for a total of 18 kB instead of 32," sounds like a great thing to have. Sorry if it's a silly question, but I'm not familiar w/SSL internals: Does this limit the connection capabilities, or will it fragment packets automatically to the smaller size? That is, can I still do, for example, a HTTPS POST of 8KB of data if I have my send buffer only at 2K? The 3K+ stack requirements are a bit rough, as the current setup give a total of 4K (and this stack is used by the OS WiFi code, too, so it's not even all for user apps!). As long as it's bounded, I imagine @igrr can increase if he swaps out axtls. |
Fragmentation is automatic and nominally invisible to applications. In fact, in the last few years, when using a CBC cipher suite with TLS 1.0, Web browsers have taken to automatically splitting off the first byte of each record into a record of its own (this is a defence against the "BEAST attack"), and they still work well with existing servers. There used to be some very poorly written application that would not tolerate fragmentation, but they have basically died out. Pre-1.0 OpenSSL would not accept fragmentation for some of the handshake messages, but this has been fixed, and here we are talking about "application data" records anyway. Of note, during the initial handshake, when encryption is not active yet, BearSSL can handle records larger than its input buffer (since there is no authentication tag to verify at this point, data can be processed as it is received). It's only after the handshake that the maximum record size matters. In a closed application where both client and server code are controlled, one can ensure that outgoing records are small but adding some "flush" calls where appropriate. RFC 6066 support would nominally break nothing. But it is useful only to the client, when RAM is scarce, and OpenSSL uses too much RAM to run on systems with little RAM, so it feels little pressure to implement it. I think patches have been floating around since at least 2014. Biggest stack user is the RSA code. In order to support RSA keys up to 4096 bits, it uses stack buffers that eat up to 2208 bytes. One can gain a kilobyte or so by reducing the maximum supported RSA key size to a lower value, such as 2048 bits. In practice, most RSA keys are 2048 bits, but some CA will use 3072 or 4096-bit RSA keys. If not using X.509 certificate validation, then only the server key size matters, and that will normally be 2048 bits (if using RSA). |
@pornin I know this is an ooooold thread, but I just got around to building BearSSL for the ESP8266 and doing some testing this weekend. A client w/the standard required 16K++ receive and minimum (876?) send buffer seems to take ~25.5KB. A server with the min send/recv. buffers was around ~5.5KB. Both of those numbers seem outstanding and would allow both a client that could talk to any server, and a server that could talk to any client, to run simultaneously in the ~43KB free on the ESP8266 Arduino system. I did notice that the full stack for X509 validation (of your bearssl cert using LEt'sEncrypt as the trust anchor, actually) took 4.5-5K on the ESP8266 from the app calling BearSSL to it returning. GCC for the xtensa isn't compiled to dump stack-sizes, and it's a pain to manually instrument each function to get a runtime accounting, so I didn't go into it in detail. Is there one function or area where a large chunk or two are stack allocated to do the RSA validation? As there's nominally only 4KB total for stack (including interrupts and inline TCP processing), I'd need to allocate those larger variable on the heap to have any chance of stability. In fact, since during this time data's not being xferred, I think I could piggyback on the already-allocated buffer memory and not actually require any more space. I see there are lots of machine generated .c files. Are those built by the .net code included in the repo, and is there an incantation for rebuilding them? The constant (state transition?) tables come to many KB and need to be moved to a different linker segment with a decorator, as well as anything that accesses them needs to use helper functions because the non-RAM segment they'd be moved to can only be read by 32-bit accesses. (There is also the possibility of patching GCC to only ever use 32-bit accesses, but that slows down even RAM accesses as GCC is not cognizant of any variable-specific requirements.) I can either hand-edit the generated .C files every release (bad idea) or patch the generator and keep a much smaller change... One last thing, what is the magic incantation to not validate the X509 cert at all, to save memory? Reading the doxygen it looks like I can pass in a custom X509 hash function, but not being a SSL expert I'm not sure exactly what that means or if I'm looking at the wrong spot entirely. Thanks! |
@earlephilhower For stack usage in RSA, the actual allocation occurs in However, at 4096 bits, stack usage is only a bit more than 2 kB for RSA; decreasing to 2048 bits will save about 1 kB. If you observe 4-5 kB stack usage, then there is some unaccounted extra stack usage elsewhere. BearSSL should normally keep itself within about 3 kB of stack space at all times. Predicting exact stack usage is hard since it depends on the target architecture, and how the C compiler allocates space; maybe GCC for xtensa does things suboptimally. You should first check that the relatively bulky state structures ( For the machine-generated files: you can rebuild them with " About not validating the X.509 certificate: for SSL to actually achieve some security, the client MUST have some way to make sure that it uses the proper public key (the one that truly belongs to the intended server). The normal way to do that is through validation, but if you have other methods to get that public key, then it is possible to do otherwise. The certificate validation engine is pluggable, and BearSSL comes with two implementations: the "minimal" engine, which is the default and performs the basic steps of validation (name matching, signature verification,...), and the "knownkey" engine, which is used when the client already knows in some unspecified way the server's public key, and just want to use it. The "knownkey" engine simply discards whatever certificates the server sends, and uses the configured public key instead. The API is explained on: https://www.bearssl.org/x509.html The default SSL client initialization function is It would be entirely possible, and actually easy, to make a third, reduced "validation" engine that would simply decode (not validate) the server certificate to get the public key, and trust it. BearSSL includes a non-validating X.509 certificate decoder (look up (Conceptually, you could use TOFU, i.e. "Trust On First Use": the client would simply trust the server key when first connecting, then remember it, and enforce its use for all subsequent connections. This is not a bad model, but it can be tough to do properly. Notably, it's hard to make TOFU work in a context where you can still occasionally change the server public key, without letting an active attacker fool the client. It would require some sort of explicit pairing process, just like Bluetooth gadgets, or SSH clients.) |
Much appreciate the detailed info! For the stack test, all buffers that the app passed in to BearSSL were new'd from the heap, so this measurement was just inside the library + the LWIP library + anything the ESP IP stack itself took out of the current stack. So some could definitely have been outside of BearSSL, but still in the code flow from the app to the lib and back. I definitely hear you on not validating the x509 certificate. On a real commercial or professional-level product that's negligent, but for folks starting on the Arduino it's kind of harsh to tell them they need to figure out the root CA for, say, www.cnn.com, download and convert the cert, add it top their sketch, and recompile their app just to use a https RSS news feed. Plus, they'd need to do it all over again if they decided they wanted to watch the BBC news feeds instead, making it rather unwieldy. We don't have the luxury of enough space to include a whole directory of trusted CAs. :( |
The T0 interpreter changes are actually quite minimal. The instruction and jump tables can go into PROGMEM and use a simple accessor helper as they are not touched except in very focused spots. The constant datatable can't as it seems to be passed out of the T0 and into certain pluggable functions (but it takes under 700bytes total so it's not a big deal compared to the ~9KB for the other two tables). The preliminary diffs for the C# are attached for your perusal, but it's still a WIP so please don't bother doing anything other than looking them over and seeing if there's something that makes you cringe in them: With this and moving the crypto (u)int32_t tables (which requires simply adding the "PROGMEM" decorator to the static const [] declaration...a simple SED script may be able to do it) it leaves ~18KB free heap out of the 44KB total while supporting a SSL bidirectional client connection. |
I've got a pre-alpha version replacing WiFiClientSecure w/a bidirectional BearSSL one in my Arduino fork: https://github.com/earlephilhower/Arduino/tree/bearssl_wip . The bearssl.ino example has downloaded your homepage so many times during my debug that I think I could re-type it from memory now. SSL_io and examples were very handy in getting it up and running so fast! Still work to go to replace the existing axtls server and handle the (IMO very silly) Arduino "copy objects by value instead of passing pointers" refcounting/etc. |
Given that #4273 is merged, I'm setting this as staged for release. |
BearSSL is a relatively new TLS library. It has a some features which may come handy in ESP8266 environment, such as:
See https://bearssl.org/goals.html for more.
On the other hand, axTLS is fairly well studied by now. I have spent a good amount of time reading its source and doing some optimizations. Others (@slaff, @earlephilhower, @ADiea) have also become familiar with axTLS and did many improvements and bug fixes. If we switch to BearSSL, that would mean investing more time to learn ins and outs of it. If we do it though, we may end up with a more predictable and reliable TLS implementation.
This issue is mainly intended to collect feedback and host discussion related to BearSSL in the context of this project.
The text was updated successfully, but these errors were encountered: