Excessive TLS connections - CPU/Memory Usage #3067

nbaztec · 2020-01-07T10:47:54Z

Version of AWS SDK for Go?

v1.26.8

Version of Go (`go version`)?

go version go1.13 darwin/amd64

What issue did you see?

Profiling the app shows that the app was spending much resources while handling TLS handshakes.
Over a third of CPU/Memory was being used for TLS negotiation (which after setting the MaxIdleConnsPerHost seemed to be better)
However, even when IdleConnTimeout is set, the connections still seem to be discarded after around ~6-8 seconds of inactivity and a new TLS negotiation is initiated.

Steps to reproduce

Connect to Kinesis

sess := session.Must(session.NewSession(&aws.Config{
		Region:                        aws.String(awsRegion),
		CredentialsChainVerboseErrors: aws.Bool(verboseErrors),
	}))
stsConfig := &aws.Config{
		Credentials:                   creds,
		Region:                        aws.String(awsRegion),
		CredentialsChainVerboseErrors: aws.Bool(verboseErrors),
		HTTPClient: &http.Client{
			Transport: &http.Transport{
				Proxy: http.ProxyFromEnvironment,
				DialContext: (&net.Dialer{
					Timeout:   30 * time.Second,
					KeepAlive: 30 * time.Second,
				}).DialContext,
				MaxIdleConns:          100,
				IdleConnTimeout:       90 * time.Second,
				MaxIdleConnsPerHost:   50,
				TLSHandshakeTimeout:   3 * time.Second,
				ExpectContinueTimeout: 1 * time.Second,
			},
		},
	}

client := kinesis.New(sess, stsConfig)
client.PutRecord(...)

Execute the script to send some data

net/http/transport.go addTLS() is invoked, for every request that is 6-10 seconds apart to start a new TLS session.

Expected

One would expect that the TLS session would only be initiated once, and be re-used from idle sessions

The text was updated successfully, but these errors were encountered:

jasdel · 2020-01-14T23:38:01Z

Thanks for reaching out @nbaztec. The connection issue you are experiencing is most likely due to the Kinesis sever closing the connection after the 6-10 second window due to inactivity. Since the server is probably closing the connection due to inactivity the client's configuration is not making a difference.

This can be verified with httptrace.ClientTrace. The httptrace.GotConnInfo value passed into the ClientTrace's GotConn callback.

Each time the sever closes the connection, the client will need to re-establish the connection. The Go HTTP client's transport TLS sessions are not resumed between re-established connections. A new TLS session will need to be created for the new connection. This is the reason for the TLS connection activity.

One potential workaround for this issue is to space reporting records out in order to reduce the duration of time between calls to PutRecord. Potentially by sending the Records to a channel that will rate limit calls to PutRecord. I'm not sure if Kinesis calling PutRecord with empty data, but if so your application could make these (or similar API call) to keep the connection alive. Though some AWS APIs will still only allow a reused connection to be reused for a limited number of requests.

Kinesis's API does support HTTP without TLS, which would remove the overhead of the TLS session setup, but would make your record data visible on an network.

nbaztec · 2020-01-15T10:06:13Z

Thanks @jasdel for the added discovery. That was our guess as well with Kinesis closing connections altogether. And in that light the approach of using channels to better throttle/batch the PutRecord is also something we are considering.

So far, as you've pointed out correctly that there's no proper library solution that can be made available to mitigate the issue, thereby we shall be looking to mitigate it with go channels instead.

Thanks!

diehlaws · 2020-05-15T22:58:05Z

Hi @nbaztec, please do let us know if you require further assistance from us on this. Otherwise feel free to close the issue, or we can let it auto-close due to inactivity.

nbaztec · 2020-05-16T07:03:46Z

Hi! Thanks for the help. We've successfully resolved the issue by tweaking the keepalive parameters for the connection on our end.

Closing this one.

leventov · 2021-04-30T08:30:02Z

@nbaztec could you please describe how did you change the keep-alive parameters specifically to resolve this problem?

We have exactly the same issue with TLS reconnections accessing DynamoDB. We have configured net.Dialer.KeepAlive to 10 seconds and still have a lot of full handshakes (addTLS() calls) in profile.

OTOH, I don't understand why keep-alive even matters under load.

However, even when IdleConnTimeout is set, the connections still seem to be discarded after around ~6-8 seconds of inactivity and a new TLS negotiation is initiated.

If there is a performance problem, why there are long periods of connection inactivity? (The same reasoning applies to our case with DynamoDB as well.)

@jasdel do you have some insights on this?

nbaztec · 2021-04-30T11:27:39Z

@leventov I mitigated it by explicitly setting the keep-alive parameters to the following in the HTTPClient:

        cfg := &aws.Config{
		...
		HTTPClient: &http.Client{
			Transport: &http.Transport{
				MaxIdleConns:        100,
				IdleConnTimeout:     90 * time.Second,
				MaxIdleConnsPerHost: 50,
				MaxConnsPerHost:     100,
			},
		},
	}

This is my hypothesis and some memory jogging from the profiler: The MaxConnsPerHost is by default set to 2 and hence under heavy load, for reasons I cannot explain, only the first 2 connections get reused and if a third connection is established it needs to performs the TLS handshake once more. Hope it helps.

leventov · 2021-04-30T14:55:42Z

@nbaztec thanks for reply!

It seems to me that setting MaxConnsPerHost makes significant difference, despite in theory, it shouldn't.

It might be that these issues are related: golang/go#20960, golang/go#42650.

Also, surprisingly, http2.ConfigureTransport() doesn't make any noticeable difference.

KingJayant · 2025-05-16T21:46:24Z

@leventov Hi leventov, I am also facing the issue of CPU usage while having concurrent requests requiring TLS negotiations with dynamodb. Can you please explain how you resolved this issue ?

diehlaws self-assigned this Jan 8, 2020

jasdel added performance service-api This issue is due to a problem in a service API, not the SDK implementation. labels Jan 14, 2020

diehlaws added the closing-soon This issue will automatically close in 4 days unless further comments are made. label May 15, 2020

nbaztec closed this as completed May 16, 2020

diehlaws removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label May 18, 2020

diehlaws removed their assignment Aug 26, 2020

leventov mentioned this issue Apr 30, 2021

[BUG] go/s3/CustomClient/CustomHttpClient.go -- add MaxConnsPerHost awsdocs/aws-doc-sdk-examples#1776

Closed

reusee mentioned this issue Aug 5, 2023

[Feature Request]: IO Queue matrixorigin/matrixone#11037

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Excessive TLS connections - CPU/Memory Usage #3067

Excessive TLS connections - CPU/Memory Usage #3067

nbaztec commented Jan 7, 2020 •

edited

Loading

jasdel commented Jan 14, 2020

Uh oh!

nbaztec commented Jan 15, 2020 •

edited

Loading

Uh oh!

diehlaws commented May 15, 2020

Uh oh!

nbaztec commented May 16, 2020

Uh oh!

leventov commented Apr 30, 2021

Uh oh!

nbaztec commented Apr 30, 2021

Uh oh!

leventov commented Apr 30, 2021

Uh oh!

KingJayant commented May 16, 2025

Uh oh!

Excessive TLS connections - CPU/Memory Usage #3067

Excessive TLS connections - CPU/Memory Usage #3067

Comments

nbaztec commented Jan 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Version of AWS SDK for Go?

Version of Go (go version)?

What issue did you see?

Steps to reproduce

Expected

jasdel commented Jan 14, 2020

Uh oh!

nbaztec commented Jan 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diehlaws commented May 15, 2020

Uh oh!

nbaztec commented May 16, 2020

Uh oh!

leventov commented Apr 30, 2021

Uh oh!

nbaztec commented Apr 30, 2021

Uh oh!

leventov commented Apr 30, 2021

Uh oh!

KingJayant commented May 16, 2025

Uh oh!

nbaztec commented Jan 7, 2020 •

edited

Loading

Version of Go (`go version`)?

nbaztec commented Jan 15, 2020 •

edited

Loading