Skip to content

add ecdsa handshake test #1146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 6, 2020
Merged

add ecdsa handshake test #1146

merged 10 commits into from
Feb 6, 2020

Conversation

wfurt
Copy link
Member

@wfurt wfurt commented Jan 23, 2020

This is primarily to show perf impact on TLS handshake with different algorithms.
I was curious after seeing most CPU burned in BN_ functions in OpenSSL.
ECDSH shows ~25% difference on my Linux box.
For consistency, I did run on each platform and OSX is even more surprising

Ubuntu 18.04

|                         Method | useRsaCertificate |         Mean |        Error |       StdDev |       Median |          Min |          Max |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------- |------------------ |-------------:|-------------:|-------------:|-------------:|-------------:|-------------:|-------:|------:|------:|----------:|
|                 WriteReadAsync |                 ? |     10.68 us |     0.375 us |     0.431 us |     10.50 us |     10.21 us |     11.53 us |      - |     - |     - |         - |
|                 ReadWriteAsync |                 ? |     17.61 us |     0.522 us |     0.601 us |     17.59 us |     16.60 us |     18.96 us | 0.0800 |     - |     - |     344 B |
|            ConcurrentReadWrite |                 ? |     13.32 us |     0.411 us |     0.473 us |     13.43 us |     12.66 us |     14.40 us |      - |     - |     - |       1 B |
| ConcurrentReadWriteLargeBuffer |                 ? |     19.22 us |     1.304 us |     1.502 us |     19.02 us |     17.33 us |     22.18 us |      - |     - |     - |         - |
|                 HandshakeAsync |             False | 45,739.91 us | 1,254.984 us | 1,445.242 us | 45,807.22 us | 43,773.66 us | 48,032.24 us |      - |     - |     - |   11840 B |
|                 HandshakeAsync |              True | 54,084.93 us | 1,596.518 us | 1,838.552 us | 54,818.68 us | 48,559.24 us | 55,623.33 us |      - |     - |     - |   20088 B |

Windows 10

|                         Method | useRsaCertificate |      Mean |     Error |    StdDev |    Median |       Min |         Max |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------- |------------------ |----------:|----------:|----------:|----------:|----------:|------------:|-------:|------:|------:|----------:|
|                 WriteReadAsync |                 ? |  11.29 us |  0.041 us |  0.038 us |  11.29 us |  11.22 us |    11.36 us |      - |     - |     - |         - |
|                 ReadWriteAsync |                 ? |  31.11 us |  0.059 us |  0.049 us |  31.11 us |  31.06 us |    31.22 us | 0.0800 |     - |     - |     344 B |
|            ConcurrentReadWrite |                 ? |  23.93 us |  0.080 us |  0.075 us |  23.93 us |  23.81 us |    24.07 us | 0.0800 |     - |     - |     365 B |
| ConcurrentReadWriteLargeBuffer |                 ? |  24.82 us |  0.432 us |  0.404 us |  24.82 us |  24.14 us |    25.53 us |      - |     - |     - |     204 B |
|                 HandshakeAsync |             False | 980.28 us | 20.863 us | 23.190 us | 970.41 us | 955.02 us | 1,033.12 us |      - |     - |     - |    6166 B |
|                 HandshakeAsync |              True | 815.59 us |  9.723 us |  8.619 us | 814.66 us | 803.36 us |   830.57 us |      - |     - |     - |    6128 B |

MacOS 10.14

|                         Method | useRsaCertificate |         Mean |        Error |       StdDev |       Median |          Min |          Max |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------- |------------------ |-------------:|-------------:|-------------:|-------------:|-------------:|-------------:|-------:|------:|------:|----------:|
|                 WriteReadAsync |                 ? |     26.27 us |     3.229 us |     3.455 us |     24.56 us |     23.54 us |     34.86 us | 0.1000 |     - |     - |     318 B |
|                 ReadWriteAsync |                 ? |     29.59 us |     1.593 us |     1.835 us |     29.45 us |     26.17 us |     32.69 us | 0.1000 |     - |     - |     345 B |
|            ConcurrentReadWrite |                 ? |     34.13 us |     1.395 us |     1.607 us |     34.41 us |     31.23 us |     36.66 us | 0.1400 |     - |     - |     481 B |
| ConcurrentReadWriteLargeBuffer |                 ? |    161.70 us |    15.358 us |    17.687 us |    162.66 us |    125.22 us |    192.54 us | 0.1000 |     - |     - |     590 B |
|                 HandshakeAsync |             False | 13,701.49 us |   331.872 us |   355.100 us | 13,670.22 us | 13,203.42 us | 14,564.32 us |      - |     - |     - |   28726 B |
|                 HandshakeAsync |              True | 83,126.61 us | 1,579.740 us | 1,551.516 us | 83,125.70 us | 80,990.78 us | 86,506.65 us |      - |     - |     - |   59878 B |

cc: @stephentoub @adamsitnik @bartonjs

@stephentoub
Copy link
Member

cc: @adamsitnik

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please respond to my question of whether we might want to add more certificates in the future or not. If not, I am going to merge it as is.

@wfurt
Copy link
Member Author

wfurt commented Jan 30, 2020

I'm not sure if this is the best place but since we have all experts here looking at the SSL handshake test, let me dump it here.

I was trying to figure out why there is so big difference for HandshakeAsync between Linux and Windows. 980 vs 45K seems quite a big difference when the rest of the crypto seem similar.
I think the main reason is that the test does not do comparable work on both platforms.
To illustrate that, I made a separate app with 3 iterations, 10 seconds apart.

image

After the initial handshake, there is 10s gap and then one would expect a new handshake with ClientHello as seen before. But that is not happening. Instead, clients send another encrypted message and probably resumes session with a ticket obtained at frame 21. (https://docs.microsoft.com/en-us/powershell/module/tls/disable-tlssessionticketkey?view=win10-ps)

TLS13 is another issue as the handshake flow is quite different. I'm wondering if it would make sense to update the test and pin it to TLS12 and perhaps change the test so we measure same work by avoiding resumption. (or we can create a separate test aimed to compare apples to apples)

On the OpenSSL side, tls tickets and resumption are possible but need to be explicitly requested and it is not used by default. If we believe there is value in this (mainly for microservices connecting to same endpoint) I can open a new issue in runtime repro to track it as feature work.

@stephentoub
Copy link
Member

On the OpenSSL side, tls tickets and resumption are possible but need to be explicitly requested and it is not used by default. If we believe there is value in this (mainly for microservices connecting to same endpoint) I can open a new issue in runtime repro to track it as feature work.

What's the downside?

@bartonjs
Copy link
Member

What's the downside?

Significantly changing the PAL to manage the session state manager, probably introducing global state and possibly introducing locking. Increased native memory usage, though probably not a whole lot.

@stephentoub
Copy link
Member

stephentoub commented Jan 31, 2020

Significantly changing the PAL to manage the session state manager, probably introducing global state and possibly introducing locking. Increased native memory usage, though probably not a whole lot.

Thanks, Jeremy.

Re: "Significantly changing the PAL to manage the session state manager"... presumably that's "just work" rather than some other ramification?

re: "probably introducing global state and possibly introducing locking"... this is something to be wary of, but not a showstopper.

re: "Increased native memory usage, though probably not a whole lot."... makes sense.

Given all that, it sounds like something we should try. There is definitely value in connections to the same endpoint, whether it be a true production benefit for microservice workloads, a things-work-similarly-across-platforms consistency benefit, or a feel-good benchmark benefit.

@bartonjs
Copy link
Member

Yeah, none of it is showstopper. Just "downside" :)

@wfurt
Copy link
Member Author

wfurt commented Jan 31, 2020

Yeah, none of it is showstopper. Just "downside" :)

That sounds about right.
I can do some scouting to see what it would take and perhaps measure perf on some crude prototype. Also, I want to take a closer look at other languages to see if they are really faster for the same task or if they avoid work.

Note that as I was going through TLS13 spec, they talk a lot about latency and roundtrips.
Something we may not be able to see and measure with localhost, but it can have an impact on a real deployment. Being able to resume has benefits beyond saving crypto CPU cycles.

@wfurt
Copy link
Member Author

wfurt commented Feb 1, 2020

To get closure on this, I decided to act on @bartonjs's recommendation and name tests by the algorithm. I also decided to leave old tests as it is and write a new version using Pipes. Te get everything more comparable, I locked test to TLS1.2 and I generated a set of certificates where only the key is different (e.g. all v3 extensions are identical) Since the original key was 4K RSA, one could expect similar performance. Here are the results (on somewhat comparable machines)

Windows 10:

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
HandshakeAsync 2,564.85 us 69.300 us 71.166 us 2,596.02 us 2,427.26 us 2,640.82 us - - - 6966 B
TLS12HandshakeECDSA256CertAsync 4,496.85 us 77.280 us 72.288 us 4,511.93 us 4,334.20 us 4,592.45 us - - - 16731 B
TLS12HandshakeRSA1024CertAsync 3,808.51 us 48.830 us 45.676 us 3,829.06 us 3,706.15 us 3,856.57 us - - - 17199 B
TLS12HandshakeRSA2048CertAsync 4,754.19 us 66.875 us 62.555 us 4,773.56 us 4,652.40 us 4,860.73 us - - - 17960 B
TLS12HandshakeRSA4096CertAsync 11,202.57 us 122.237 us 114.340 us 11,199.07 us 10,988.25 us 11,386.16 us - - - 19465 B
WriteReadAsync 37.63 us 0.105 us 0.098 us 37.67 us 37.45 us 37.77 us - - - -
ReadWriteAsync 44.70 us 0.359 us 0.336 us 44.62 us 44.28 us 45.28 us 0.0800 - - 344 B
ConcurrentReadWrite 48.20 us 0.543 us 0.507 us 48.31 us 47.34 us 48.90 us 0.1600 - - 661 B
ConcurrentReadWriteLargeBuffer 55.64 us 1.837 us 2.116 us 55.91 us 51.43 us 59.75 us 0.1000 - - 675 B

Linux Ubuntu 18.04 with OpenSSL 1.1.1

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
HandshakeAsync 54,270.03 us 1,354.036 us 1,559.310 us 55,067.52 us 51,192.17 us 56,484.86 us - - - 20088 B
TLS12HandshakeECDSA256CertAsync 1,951.04 us 41.953 us 46.631 us 1,970.74 us 1,851.62 us 2,011.66 us - - - 19578 B
TLS12HandshakeECDSA512CertAsync 3,935.19 us 96.167 us 110.746 us 3,929.56 us 3,716.64 us 4,122.92 us - - - 19984 B
TLS12HandshakeRSA1024CertAsync 1,818.54 us 51.316 us 59.096 us 1,812.84 us 1,666.86 us 1,925.27 us - - - 19970 B
TLS12HandshakeRSA2048CertAsync 2,295.54 us 49.384 us 43.778 us 2,276.49 us 2,256.43 us 2,411.73 us - - - 20752 B
TLS12HandshakeRSA4096CertAsync 7,800.41 us 227.845 us 262.387 us 7,823.56 us 7,394.99 us 8,239.88 us - - - 22302 B
WriteReadAsync 11.23 us 0.151 us 0.141 us 11.19 us 11.06 us 11.47 us - - - -
ReadWriteAsync 17.45 us 0.377 us 0.353 us 17.47 us 17.05 us 18.31 us 0.0800 - - 344 B
ConcurrentReadWrite 13.95 us 0.352 us 0.376 us 13.98 us 13.33 us 14.64 us - - - -
ConcurrentReadWriteLargeBuffer 18.97 us 1.451 us 1.671 us 18.84 us 16.55 us 22.34 us - - - -

There are some interesting anomalies:

  • New tests are faster on Linux - not what I expected. My earlier test with Pipes shows Windows performing a little bit better so transport should not be the explanation.

  • TLS12HandshakeRSA4096CertAsync should be comparable to HandshakeAsync but on Windows, it is much slower. Using Pipes and randomizing the target name prevents handshake resumption. It is hard to know for sure since I cannot do packet capture but that wat the intent and test results match that.

  • On Linux, TLS12HandshakeRSA4096CertAsync is much faster than the original HandshakeAsync. I think it is because it uses a self-signed certificate but the original certificate has one level hierarchy e.g. cert signed by private CA.

I'll keep digging into this but I feel that does not need to help back new tests.
Do we publish "official" results anywhere @adamsitnik? It would be nice to get independent verification on identical machines.

@adamsitnik
Copy link
Member

Do we publish "official" results anywhere

We do! Once we merge this PR (is it ready now?) our infra is going to run it daily and publish the results to our Reporting System I am going to forward you an email from @DrewScoggins who explains how to use it.

@wfurt
Copy link
Member Author

wfurt commented Feb 6, 2020

yes, this should be good to go @adamsitnik. I may add more tests later but that should not hold this back IMHO.

@adamsitnik adamsitnik merged commit 9a134b4 into dotnet:master Feb 6, 2020
@wfurt wfurt deleted the ecdsa_handshake branch February 6, 2020 20:19
@wfurt
Copy link
Member Author

wfurt commented Feb 14, 2020

The final result from CI machines on identical hardware:

Name Ubuntu Windows Difference
System.Net.Security.Tests.SslStreamTests.TLS12HandshakeECDSA256CertAsync 1.50 ms 4.61 ms -207%
System.Net.Security.Tests.SslStreamTests.TLS12HandshakeRSA1024CertAsync 1.34 ms 3.94 ms -194%
System.Net.Security.Tests.SslStreamTests.TLS12HandshakeRSA2048CertAsync 1.84 ms 4.37 ms -137%
System.Net.Security.Tests.SslStreamTests.TLS12HandshakeRSA4096CertAsync 4.67 ms 8.36 ms -79%
System.Net.Security.Tests.SslStreamTests.ReadWriteAsync 13.17 μs 17.23 μs -31%
System.Net.Security.Tests.SslStreamTests.HandshakeAsync 41.31 ms 640.93 μs 98%

Rest of the tests is on a par between OSes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants