-
Notifications
You must be signed in to change notification settings - Fork 13.3k
BearSSL::WiFiClientSecure::connect() throws Exception #6143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Update: I replaced Issue seems to be fixed. Maybe BearSSL lib needs to be updated? Or, maybe my library was compiled with different flags? I did: make \
CC=~/.platformio/packages/toolchain-xtensa/bin/xtensa-lx106-elf-gcc \
LD=~/.platformio/packages/toolchain-xtensa/bin/xtensa-lx106-elf-ld \
AR=~/.platformio/packages/toolchain-xtensa/bin/xtensa-lx106-elf-ar \
LDDLL=~/.platformio/packages/toolchain-xtensa/bin/xtensa-lx106-elf-ld and copied ~/build/libbearssl.a to tools/sdk/lib/libbearssl.a. |
Update 2: Spoke too soon. With my version of Building from the correct https://github.com/earlephilhower/bearssl-esp8266 the issue remains as originally described. |
@d-a-v, I'm seeing this crashing inside LWIP wrappers. Any ideas? |
I've been looking a lot at exception too, and think its misleading. My theory, and some evidence below, is the exception happens because bearssl corrupts LWIP memory and causes the eventual crash. Instead I've been using a second sketch based on my comment: "sometimes connect() completes successfully but memory elsewhere not owned by wifi client is corrupted". Seems like The idea is #include <ESP8266WiFi.h>
#include <WiFiClientSecure.h>
#include <time.h>
//#include <GDBStub.h>
static const int pad = 768;
char *head;
WiFiClientSecure *client;
char *tail;
void hexDump(char *addr, int size) {
for (char *row = (char *)(uint32_t(addr) & 0xfffffff0); row < addr + size; row += 16) {
Serial.printf("%08x: ", (uint32_t) row);
for (char *col = row; col < row + 16; col++) {
if ((col < addr) || (col > addr + size)) {
Serial.print("__");
} else {
Serial.printf("%02x", *col);
}
}
Serial.printf("\r\n");
}
Serial.println();
}
void setup() {
Serial.begin(115200);
//gdbstub_init();
WiFi.mode(WIFI_STA);
WiFi.hostname("...");
WiFi.begin("braiden", "...");
while (WiFi.status() != WL_CONNECTED) delay(10);
Serial.println("WiFi Connected");
tail = (char *) calloc(pad, sizeof(char));
client = new WiFiClientSecure();
head = (char *) calloc(pad, sizeof(char));
client->setX509Time(1558874730);
Serial.printf("head=%08x\r\n", head);
Serial.printf("client=%08x\r\n", client);
Serial.printf("tail=%08x\r\n", tail);
Serial.println();
}
void loop() {
Serial.print(".");
if (memcmp(head, tail, pad)) {
hexDump(head, pad);
hexDump((char *)client, sizeof(WiFiClientSecure));
hexDump(tail, pad);
while (true) delay(1000);
}
client->connect("mqtt.2030.ltsapis.goog", 443);
delay(1000);
}
Turns out WiFiClientSecureBearSSL.cpp: 523 Serial.printf("br_ssl_engine_recvrec_ack: %08x\r\n", *(uint16_t *)(0x3ffefe90));
524 br_ssl_engine_recvrec_ack(_eng, rlen);
525 Serial.printf("br_ssl_engine_recvrec_ack: %08x\r\n", *(uint16_t *)(0x3ffefe90)); Prints:
|
I tried OP sketch and it works no error if
Otherwise, I get WDT. In that type of sketch, making a (SSL or not) connection to a server every 0.1s greatly increases the chances to be temporarily banned, making things worse to debug. (edit: 150 successful connections, no fail, 10s between each) |
"... and selecting basic SSL ciphers". Does that imply -DBEARSSL_SSL_BASIC? I was also just trying that too. And it leads me to suspect the reason
Thanks both for your help! |
my tests are run on
Have you tried at 160MHz (option in menu) ? |
I'm using platformio with Running the basic connect loop sketch with Adding |
FYI, https://cloud.google.com/iot/docs/how-tos/mqtt-bridge#using_a_long-term_mqtt_domain cites That's disabled if -DBEARSSL_SSL_BASIC is set. |
Your sample code can never actually connect to anything. No trust anchor was set, and That's a "no trust anchor" error and should not cause a crash, though. You'll also want to be at 160mhz, too, or you'll take too long to connect and be punted off most servers. |
Thanks, let me look at that more closely now. That part of the code has been pretty stable for a while, but your constant failing-to-connect may be exercising it in a different way than others' code... Might be BSSL stack overflowing, actually, given your writeup. |
Yes, it is stack overflow. Very good debugging, @braiden! I only allocate 5600 bytes, so there is your overwrite. It's a 1-line fix in cores/esp8266/StackThunk.cpp. Change 5600 to 5750 and your crashes should disappear. I'll do a PR ASAP. |
Fixes esp8266#6143 . The BSSL stack grew to 5700 bytes when connecting to a certain website, which corrupted memory since only 5600 bytes were allocated. Bump stack up to 5750 to avoid issue.
Fix esp8266#6143 which found a cipher combination which overran the old limit of 5600 bytes (it required 5700 bytes).
Fix #6143 which found a cipher combination which overran the old limit of 5600 bytes (it required 5700 bytes).
@braiden, can you confirm you're fixed with the latest GIT head? See above about the connection requirements (setting a CA/insecure), but there shouldn't be a crash. |
Great, Thanks for the quick fix! Verified at ...d83eabe |
On return from a BSSL call, check that the last element of the stack is still untouched. If it is modified, print an error and abort(). Will catch problems like esp8266#6143 many times with an informative error message instead of corrupting the heap and having a random crash sometime later.
On return from a BSSL call, check that the last element of the stack is still untouched. If it is modified, print an error and abort(). Will catch problems like #6143 many times with an informative error message instead of corrupting the heap and having a random crash sometime later.
I'm not sure why when I add a
NeoPixelBrightnessBus<NeoGrbFeature, NeoEsp8266Dma800KbpsMethod> pxl(1);
void setup() {
Serial.begin(115200)
pxl.Begin();
... |
There is another PR just merged which puts a canary on the end of the stack and checks it every time BearSSL returns. You could pull git head and give a whirl by dropping for your specific needs and seeing if it triggers. As for the stack issue, NeoPixel eats a large amount of stack space in its interrupt handler. And when you're doing SSL and an interrupt happens it uses the BearSSL stack. |
Makes sense. I don't need a NeoPixel so I'll just drop that for more predictable stack usage. Thanks again! I'm already one latest commit. I saw "Abort Called", but didn't see "FATAL ERROR: BSSL stack overflow" print. So wasn't sure if that was the canary or not: I'm happy to not use the skip NeoPixel, so i'm good, but let me know if there's any more testing i can do.
|
Basic Infos
Platform
Settings in IDE
Problem Description
BearSSL::WiFiClientSecure::connect() throws Exception while connecting. The issue happens about 3/10. So the sketch below fails in a few seconds. Both a hostname of "mqtt.2030.ltsapis.goog" and a call to configTime() are required or the bug doesn't happen (or happens much much less often.)
Also, an exception is not always thrown, sometimes connect() completes successfully but memory elsewhere not owned by wifi client is corrupted. I could provide another sketch demonstrating this, but thought the exception more useful.
The sketch omits X509List, same issue happens with valid certs.
MCVE Sketch
Debug Messages
[edit, updated stack with debug symbols for local files]
The text was updated successfully, but these errors were encountered: