Skip to content

SSL connection fails sporadically #49906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
paveliak opened this issue Aug 7, 2023 · 1 comment
Open
1 task done

SSL connection fails sporadically #49906

paveliak opened this issue Aug 7, 2023 · 1 comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions

Comments

@paveliak
Copy link

paveliak commented Aug 7, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

We run Kestrel in Linux containers using .Net 7.0.

Occasionally we observe some (not all) pods getting into a failed state upon the deployment which prevents clients from connecting to them. Restarting those pods usually helps to resolve the issue.

curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Further investigation pointed out that those pods have a broken SSL chain containing only the leaf certificate. Here is the tail of openssl output

---
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 2338 bytes and written 430 bytes
Verification error: unable to verify the first certificate
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 21 (unable to verify the first certificate)
---
DONE

We think this could be caused by an intermediate network issue affecting (timing out) AIA requests that Kestrel does upon the service startup causing it to fail building SSL chain and silently proceeding with a broken chain.

Expected Behavior

Kestrel throws an exception failing initialization if SSL chain cannot be built. This might be a breaking change and so additional constraints could be applied here, but with the current behavior it is easy to shoot yourself into the foot because SSL will work just fine most of the time and then randomly get broken for a subset of pods

Steps To Reproduce

https://github.com/paveliak/kestrelssl

Exceptions (if any)

No response

.NET Version

7.0.102

Anything else?

I want to call out two things:

  1. A simple way to prevent bad pods from coming online could be to configure Https startup probe in K8s, however default probe does not validate SSL chain and one has to define a custom probe (e.g. that uses curl), which is not typical.

  2. This issue could be fixed by providing certificate bundle upon initialization so it could be considered as a documenttation gap (not a code) issue

        listenOptions.UseHttps(sslCertificate, configureOptions =>
        {
            configureOptions.ServerCertificateChain = bundle;
        });
@ghost ghost added the area-runtime label Aug 7, 2023
@adityamandaleeka adityamandaleeka added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 7, 2023
@adityamandaleeka
Copy link
Member

Cross linking to full cert chain follow up issue: #43193

@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@wtgodbe wtgodbe removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@wtgodbe wtgodbe removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 13, 2024
@dotnet dotnet deleted a comment from dotnet-policy-service bot Feb 13, 2024
@dotnet dotnet deleted a comment from dotnet-policy-service bot Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

No branches or pull requests

3 participants