Skip to content

BearSSL takes more than twice as long for handshake as axTLS #5110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
6 tasks done
marcelstoer opened this issue Sep 5, 2018 · 17 comments
Closed
6 tasks done

BearSSL takes more than twice as long for handshake as axTLS #5110

marcelstoer opened this issue Sep 5, 2018 · 17 comments

Comments

@marcelstoer
Copy link
Contributor

marcelstoer commented Sep 5, 2018

This is a follow-up to #4738 (comment)

Claim

I observed that BearSSL in standard configuration takes more than twice as long for a handshake as axTLS.

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: WeMos D1 Mini Pro
  • Core Version: 2.4.2
  • Development Env: Arduino IDE
  • Operating System: macOS

Sample sketch

The below sketch is inspired by what @artua presented at #4738 but it's much simpler. I feel that when just starting out measuring / comparing performance you shouldn't collect too much data at once as you might not see the wood for the trees. Once you identify a suspect you then adjust the rig accordingly and collect more data around the suspected culprit.

The program establishes a connection to a SSL/TLS protected host 50 times with a delay of 1.5s. For every attempt it reports total failure/success count, accumulated handshake time, average handshake time and free heap at the end. No server certificates or fingerprints are verified!

Around line 40 you define whether to use BearSSL (with setInsecure()) or axTLS.

#include <ESP8266WiFi.h>

#define HOST "thingpulse.com"
#define PORT 443
#define WIFI_SSID "ssid"
#define WIFI_PWD "password"

const char* ssid = WIFI_SSID;
const char* password = WIFI_PWD;

int numberOfSamples = 50;
int totalCounter = 0;
int failCounter = 0;
uint64_t totalDuration = 0;
void setup () {
  Serial.begin(115200);
  Serial.println("");
  WiFi.begin(ssid, password);

  Serial.print("Connecting to Wifi");
  while (WiFi.status() != WL_CONNECTED) {
    delay(300);
    Serial.print(".");
  }
  Serial.println("done.");
}


void loop() {
  if (totalCounter < numberOfSamples) {
    bool success = connectToHost();
    delay(1500);
  }
}

bool connectToHost() {
  uint64_t startTime = millis();
//  BearSSL::WiFiClientSecure client;
//  client.setInsecure();
  WiFiClientSecure client;

  boolean success = false;
  totalCounter++;
  if (client.connect(HOST, PORT)) {
    // Serial.printf("Connection to %s success.\n", HOST);
    success = true;
  } else {
    failCounter++;
    // Serial.printf("Connection to %s failed.\n", HOST);
  }
  client.flush(); // not actually needed
  client.stop();
  totalDuration += millis() - startTime;
  Serial.printf("Failed: %d, Total: %d (%f %%), Total Dur: %d, Avg. Dur: %f, Heap: %d\n", 
    failCounter, 
    totalCounter, 
    (100.0 * failCounter / totalCounter), 
    totalDuration,
    1.0 * totalDuration / totalCounter, 
    ESP.getFreeHeap());
  return success;
}

Test results

I ran the above sketch with all combinations of BearSSL/axTLS, lwIP v2 lower memory/higher bandwidth and 80/160MHz on core 2.4.2. So, this makes 2^3 * 50 samples.

The summary is as follows:
0 connection failures. That's remarkable as this whole endeavour started out with this issue https://stackoverflow.com/q/52143894.

BearSSL

  • lwIP v2 lower memory, 80 MHz: ~860ms
  • lwIP v2 higher bandwidth, 80 MHz: ~845ms
  • lwIP v2 lower memory, 160 MHz: ~510ms
  • lwIP v2 higher bandwidth, 160 MHz: ~500ms

axTLS

  • lwIP v2 lower memory, 80 MHz: ~360ms
  • lwIP v2 higher bandwidth, 80 MHz: ~350ms
  • lwIP v2 lower memory, 160 MHz: ~250ms
  • lwIP v2 higher bandwidth, 160 MHz: ~240ms

-> BearSSL takes more than twice as long
-> As expected higher bandwidth has close to zero impact as only a socket connection with handshake was established. No resources were loaded from the host.

The detailed result is in the below PDF.
BearSSL-vs-axTLS.pdf

Over at #4738 @earlephilhower stated that

BearSSL often negotiates a more secure but slower running cypher... limiting the receiving end's connection capabilities in your SSL configuration

I would first have to look into that. Not familiar with SSL config yet.

@earlephilhower
Copy link
Collaborator

earlephilhower commented Sep 11, 2018

@marcelstoer we can change your local BearSSL to only use the ciphermodes supported by axtls. In libraries/ESP8266WiFi/src/WiFiClientSecureBearSSL.cpp there's a structure, suites_P.

Replace the long list with just the following to match the very limited scope of axTLS:

BR_TLS_RSA_WITH_AES_256_CBC_SHA256,
BR_TLS_RSA_WITH_AES_128_CBC_SHA256,
BR_TLS_RSA_WITH_AES_256_CBC_SHA,
BR_TLS_RSA_WITH_AES_128_CBC_SHA

And you can re-compile and run on your same configuration and hardware to get the numbers, w/o any changes to your own test code.

@earlephilhower earlephilhower added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Sep 11, 2018
@devyte
Copy link
Collaborator

devyte commented Sep 11, 2018

CC @llongeri

@marcelstoer marcelstoer changed the title BearSSL takes more than twice as long for handshake than axTLS BearSSL takes more than twice as long for handshake as axTLS Sep 17, 2018
@marcelstoer
Copy link
Contributor Author

marcelstoer commented Sep 17, 2018

@earlephilhower thanks for the valuable information. I tested what you proposed and the effect is quite striking:

lwIP v2 lower memory, 80 MHz: ~420ms

-> still 15% more than with axTLS
-> less than half of what you get with the BearSSL default config

lwIP v2 lower memory, 160 MHz: ~310ms

I just wish we could configure such fundamentals in the sketch rather than tinkering with the core's .cpp files (possible adjusting for every project).

CC @squix78

@earlephilhower
Copy link
Collaborator

@marcelstoer, appreciate the concrete testing numbers!

If you're power constrained than I suggest you look at 160MHz operation. My gut tells me you're not going to use much more current than 80MHz, but will need to run for ~40% less time so it will have lower total power. But that's orthogonal to BearSSL/axTLS.

Adding a call to allow you to select the cipher list (or pass in a list) is not a big deal, and I think it would cover you here. The more secure ones can end up taking more CPU, so it's a trade off here where the BearSSL chosen default is security over performance.

Another option would be to support multiple configurations in the GUI, like w/LWIP. For example, I'm compiling with a setting which minimizes RAM usage (I think 1-2KB RAM saved, so it's not trivial) but causes, potentially, slower operation in the main SSL loop. Turning off that switch would trade RAM for performance.

@marcelstoer
Copy link
Contributor Author

Another option would be to support multiple configurations in the GUI

I would argue that this configuration belongs to the application and should be store there. The cipher suites you select are dependent on the needs of the application and the remote resources it talks to.

@earlephilhower
Copy link
Collaborator

Agree with you, I think I was unclear. The WiFiClientSecure::setCipher() call should be per-app and available any time.

I was suggesting providing a "BearSSL Low Memory" and a "BearSSL High Performance" version of the library that's compiled differently. Like LWIP is done today, it's a link-time option. The "High Performance" will have the same codepaths as the low-mem, but use different GCC switches which will reduce time spent doing some of the SSL operations (but use RAM).

@earlephilhower
Copy link
Collaborator

@marcelstoer since you've got a good testing infrastructure already, could you unzip the file below and replace the libbearssl.a in tools/sdk/lib with it and give your minimal-cipher test case another try? I'm trying to see if it makes sense to include the version that trades RAM for (hopefully) speed...

esp8266-fast.zip

@artua
Copy link

artua commented Sep 18, 2018 via email

@marcelstoer
Copy link
Contributor Author

@earlephilhower will do

160Mhz operation but tested power consumption is nearly the same as 80Mhz, so it is the way
we can reduce ssl handshake time dramatically

Did you time this? My test results (see above) show that CPU frequency has only negligible impact on the actual handshake time.

@artua
Copy link

artua commented Sep 18, 2018 via email

@earlephilhower
Copy link
Collaborator

@marcelstoer For CPU-bound stuff, it should increase performance (up to the point when the slow flash interface limits it). @artua's numbers (maybe yours?) are actually comparing EC vs. RSA handshakes, so they're not a good comparison between axtls and BearSSL. However, the (2) BearSSL and (3) BearSSL numbers can be compared (they're both EC handshakes) and it's a significant speedup. RSA handshake, I don't know yet, it could be RTT limited over the network.

If you get a few minutes, I'd appreciate testing the ZIPped library from a couple back with your RSA-hacked BearSSL core. That's got a different compile option which may speed it up...

@artua
Copy link

artua commented Sep 18, 2018 via email

@marcelstoer
Copy link
Contributor Author

@earlephilhower The verdict is in...

In #5110 (comment) I added the 160MHz numbers for the original "minimal ciphers" test case. I would say that ~300ms is satisfactorily low.

Then I tried your new libbearssl.a with both 80MHz and 160MHz. The speed improvement is less than 5%. The extra RAM is negligible.

What I observed in many test runs are two "anomalies":

  • The first sample is usually the fastest, then the average handshake time keeps increasing up to around sample 30 before it stablizes. One explanation is of course simple math as small spikes become less significant when the number of samples increases.
  • Often somewhere between sample 5 and sample 10 there is 1 - always 1 - that takes ~3s (i.e. 10x more than the previous and the following sample).

Conclusion for my battery powered project

  • use BearSSL
  • slash the number of cipher suites i.e. remove the costly ones
  • run at 160MHz

@earlephilhower
Copy link
Collaborator

Thanks for the update, @marcelstoer . Given that, I don't see any reason to offer two versions of the BearSSL library (mem vs speed). I still see the need for a "use these ciphers" and an axtls_ciphers constant default. That's a pretty trivial addition and will let you use faster, less secure ones while @artua can then choose only to use the newer (possibly slower and more secure) ones.

earlephilhower added a commit to earlephilhower/Arduino that referenced this issue Sep 19, 2018
BearSSL has many more ciphers than axTLS, but they are more compute intensive
and slower.  Add an option to use only the same, limited security, axTLS ciphers
as well as allow users to specify any suite of ciphers they want using standard
BearSSL formats.

Fixes esp8266#5110
@earlephilhower
Copy link
Collaborator

@artua , PR #5151 adds a call where you can specify a single BearSSL-format cipher. You could try hardcoding the ChaCha20 using it. I have no idea what the performance difference would be, though.

devyte pushed a commit that referenced this issue Sep 21, 2018
* Allow cipher specification for BearSSL

BearSSL has many more ciphers than axTLS, but they are more compute intensive
and slower.  Add an option to use only the same, limited security, axTLS ciphers
as well as allow users to specify any suite of ciphers they want using standard
BearSSL formats.

Fixes #5110

* Rename methods to avoid axtls references.

* Allow std::vector to set a list of allowed ciphers

For C++ afficionados, allow std::vectors to be passed in to the setCipher()
routine.

The BearSSL object will now keep a copy of any set ciphers and free on object
destruction.  These custom lists should normally only be 1-4 entries long, so it
is not expected to be a memory hog having this extra copy.
@devyte devyte added this to the 2.5.0 milestone Sep 21, 2018
@devyte devyte added type: enhancement component: libraries component: TLS and removed waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. labels Sep 21, 2018
@devyte
Copy link
Collaborator

devyte commented Sep 21, 2018

@marcelstoer #5151 was merged, which autoclosed this. In the interest of moving forward, I'm leaving it closed for now. If you think further changes are needed, please provide details in a new issue.
Thanks again for the numbers!

@earlephilhower
Copy link
Collaborator

@marcelstoer , check out my latest PR, #5160. Using SSL sessions makes reconnection times 80% faster or so!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants