Skip to content

Infinite loop when TCP connection drops during SSL handshake #6464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
6 tasks
johnm545 opened this issue Aug 29, 2019 · 0 comments
Closed
6 tasks

Infinite loop when TCP connection drops during SSL handshake #6464

johnm545 opened this issue Aug 29, 2019 · 0 comments

Comments

@johnm545
Copy link
Contributor

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: ESP 12E
  • Core Version: Happens with 2.5.2-release and 2.6.0-dev
  • Development Env: Arduino
  • Operating System: Windows

Settings in IDE

  • Module: NodeMCU
  • Flash Mode: [qio|dio|other]
  • Flash Size: [4MB/1MB]
  • lwip Variant: [v2 Lower Memory|
  • Reset Method: [ck|nodemcu]
  • Flash Frequency: [40Mhz]
  • CPU Frequency: 80Mh
  • Upload Using: SERIAL
  • Upload Speed: [115200|

Problem Description

This is an edge case but has happened to my projects a few times, and has been bugging me for a while. It's an annoying one because yield is still being called, so the watchdog never fires. I first assumed it was something in the SDK but I've managed to reproduce (and fix) it reliably. To reproduce you need to cause the connection to drop (i.e. half open socket) during the SSL handshake, but leave the WiFi up. It's easy enough to do as the SSL handshake takes a good 4 seconds. Just pull your internet connection from the wall after the DNS lookup.

WiFiClientSecure::_wait_for_handshake calls _run_until with target state BR_SSL_SENDAPP
The problem is that the _run_until loop has no timeout and will run forever while trying to reach the sendapp state. It will only break if the WiFi goes down; this causes a beacon timeout event to bubble up from the SDK which closes the connection and causes _run_until to return -1.

Example below: I also added a debug print to the _run_until loop as below to demonstrate what happens: (WiFiClientSecureBearSSL.cpp line 440ish)

  unsigned long lastDebugPrint = 0;
  for (int no_work = 0; blocking || no_work < 2;) {
    
    // debug stuff
    if (millis() - lastDebugPrint > 1000) {
      lastDebugPrint = millis();
      Serial.print("Hello from _run_until loop, no_work = ");
      Serial.println(no_work);
    }

MCVE Sketch

#include <ESP8266WiFi.h>
#include <WiFiClientSecure.h>

const char* host = "postman-echo.com";
int port = 443;  //https
WiFiClientSecure wifiClient;

void setup() {
  wifiClient.setInsecure(); // disable validation of server certificate
  Serial.begin(115200);
  delay(1000);

  WiFi.begin(/*ssid, password if needed */);     //Connect to your WiFi router
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
  Serial.println("wifi connected");
  Serial.println("To test, pull the plug on your internet just after connection attempt begins");

  int i = 5;
  while (i) {
    Serial.print("Connection attempt in "); 
    Serial.println(i);
    
    delay(1000);
    i--;
  }
  Serial.println("GO!");

  if (!wifiClient.connect(host, port)) {
    Serial.println("initial connection failed, restart to try again");
    while (1) delay(1);
  }
  Serial.println("Secure socket connected to server");
}

void loop() {}

Debug Messages

23:20:07.082 -> SDK:2.2.2-dev(38a443e)/Core:2.5.2-114-g55539ae9=20502114/lwIP:STABLE-2_1_2_RELEASE/glue:1.1-8-g2314329/BearSSL:89454af
23:20:07.082 -> wifi evt: 2
23:20:07.176 -> scandone
23:20:07.176 -> state: 0 -> 2 (b0)
23:20:07.176 -> state: 2 -> 3 (0)
23:20:07.176 -> state: 3 -> 5 (10)
23:20:07.176 -> add 0
23:20:07.176 -> aid 4
23:20:07.176 -> cnt 
23:20:07.270 -> 
23:20:07.270 -> connected with (removed)
23:20:07.270 -> dhcp client start...
23:20:07.270 -> wifi evt: 0
23:20:08.207 -> scandone
23:20:08.582 -> ..ip:192.168.20.14,mask:255.255.255.0,gw:192.168.20.1
23:20:09.379 -> wifi evt: 3
23:20:09.567 -> .wifi connected
23:20:09.567 -> To test, pull the plug on your internet just after connection attempt begins
23:20:09.567 -> Connection attempt in 5
23:20:10.598 -> Connection attempt in 4
23:20:11.582 -> Connection attempt in 3
23:20:12.567 -> Connection attempt in 2
23:20:13.598 -> Connection attempt in 1
23:20:14.582 -> GO!
23:20:14.582 -> [hostByName] request IP for: postman-echo.com
23:20:14.676 -> [hostByName] Host: postman-echo.com IP: 34.203.76.146
23:20:14.676 -> :ref 1
23:20:14.910 -> BSSL:_connectSSL: start connection
23:20:14.957 -> Hello from _run_until loop, no_work = 0
23:20:14.957 -> :wr 221 0
23:20:14.957 -> :wrc 221 221 0
23:20:15.989 -> Hello from _run_until loop, no_work = 101533
23:20:16.973 -> Hello from _run_until loop, no_work = 203927
23:20:17.207 -> pm open,type:2 0
23:20:17.957 -> Hello from _run_until loop, no_work = 306073
23:20:18.989 -> Hello from _run_until loop, no_work = 408294
23:20:19.973 -> Hello from _run_until loop, no_work = 510639
23:20:20.957 -> Hello from _run_until loop, no_work = 613108
23:20:21.989 -> Hello from _run_until loop, no_work = 715448
23:20:22.973 -> Hello from _run_until loop, no_work = 817642
23:20:23.957 -> Hello from _run_until loop, no_work = 919930
23:20:24.989 -> Hello from _run_until loop, no_work = 1022180
23:20:25.973 -> Hello from _run_until loop, no_work = 1124541
23:20:26.957 -> Hello from _run_until loop, no_work = 1227009
23:20:27.990 -> Hello from _run_until loop, no_work = 1329211
23:20:28.974 -> Hello from _run_until loop, no_work = 1431504
23:20:29.958 -> Hello from _run_until loop, no_work = 1533993
23:20:30.990 -> Hello from _run_until loop, no_work = 1636374
23:20:31.976 -> Hello from _run_until loop, no_work = 1738841
23:20:33.007 -> Hello from _run_until loop, no_work = 1841038
23:20:33.991 -> Hello from _run_until loop, no_work = 1943225
23:20:34.976 -> Hello from _run_until loop, no_work = 2044402
23:20:36.007 -> Hello from _run_until loop, no_work = 2146867
23:20:36.991 -> Hello from _run_until loop, no_work = 2248994
23:20:37.976 -> Hello from _run_until loop, no_work = 2351332
23:20:39.008 -> Hello from _run_until loop, no_work = 2453643
23:20:39.992 -> Hello from _run_until loop, no_work = 2556008
23:20:40.976 -> Hello from _run_until loop, no_work = 2658445
23:20:42.008 -> Hello from _run_until loop, no_work = 2760940
23:20:42.992 -> Hello from _run_until loop, no_work = 2863484
23:20:43.977 -> Hello from _run_until loop, no_work = 2965829
23:20:45.008 -> Hello from _run_until loop, no_work = 3068322
23:20:45.992 -> Hello from _run_until loop, no_work = 3170814
23:20:46.977 -> Hello from _run_until loop, no_work = 3273166

.... and so on indefinately

Now 7 minutes later I turn off the access point


23:27:56.417 -> Hello from _run_until loop, no_work = 47236443
23:27:57.448 -> Hello from _run_until loop, no_work = 47339038
23:27:58.432 -> Hello from _run_until loop, no_work = 47441653
23:27:59.417 -> Hello from _run_until loop, no_work = 47544268
23:28:00.073 -> bcn_timout,ap_probe_send_start
23:28:00.448 -> Hello from _run_until loop, no_work = 47646543
23:28:01.432 -> Hello from _run_until loop, no_work = 47748555
23:28:02.417 -> Hello from _run_until loop, no_work = 47850602
23:28:02.557 -> ap_probe_send over, rest wifi status to disassoc
23:28:02.557 -> state: 5 -> 0 (1)
23:28:02.557 -> rm 0
23:28:02.557 -> pm close 7
23:28:02.557 -> wifi evt: 1
23:28:02.557 -> STA disconnect: 200
23:28:02.886 -> :wustmo
23:28:02.886 -> :close
23:28:02.886 -> BSSL:_wait_for_handshake: failed
23:28:02.886 -> BSSL:Couldn't connect. Error = 'Unknown error code.'
23:28:02.886 -> initial connection failed, restart to try again

The fix is easy: just add a timeout to the loop, but I wanted to document it in an issue as I'm sure others must have scratched their head over this too at some point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant