Skip to content

[Perf] Cache key size in CngKey #89599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 1, 2023

Conversation

GrabYourPitchforks
Copy link
Member

An internal partner is reporting that they're seeing CNG signature validation latency show up as a bottleneck. The culprit seems to be that each signing & validation operation queries the CngKey.KeySize property.

public CngKey Key
{
get
{
CngKey key = _core.GetOrGenerateKey(KeySize, CngAlgorithm.Rsa);
return key;
}

public CngKey GetOrGenerateKey(int keySize, CngAlgorithm algorithm)
{
ThrowIfDisposed();
// If our key size was changed, we need to generate a new key.
if (_lazyKey != null)
{
if (_lazyKey.KeySize != keySize)
DisposeKey();
}

The CngKey.KeySize property is not cached anywhere in the system, which means that each query to this property results in an LRPC call back into lsass or lsaiso. This context switch is what appears to be introducing the latency.

Since CngKey instances can't change their key size once created, we can cache the key size on our end without requiring calls back through the NCrypt* (lsass) layer. That significantly reduces the latency.

public class CngRunner
{
    private RSACng _rsa;
    private ECDsaCng _ecdsa;

    private static byte[] _emptyDigest = new byte[256 / 8];
    private byte[] _rsaSignedHash;
    private byte[] _ecdsaSignedHash;

    [GlobalSetup]
    public void Setup()
    {
        _rsa = new RSACng(2048);
        _ecdsa = new ECDsaCng(256);

        _rsaSignedHash = _rsa.SignHash(_emptyDigest, HashAlgorithmName.SHA256, RSASignaturePadding.Pss);
        if (!Rsa_VerifyHash()) { throw new CryptographicException("Self-test failed."); }

        _ecdsaSignedHash = _ecdsa.SignHash(_emptyDigest);
        if (!Ecdsa_VerifyHash()) { throw new CryptographicException("Self-test failed."); }
    }

    [Benchmark]
    public bool Rsa_VerifyHash() => _rsa.VerifyHash(_emptyDigest, _rsaSignedHash, HashAlgorithmName.SHA256, RSASignaturePadding.Pss);

    [Benchmark]
    public bool Ecdsa_VerifyHash() => _ecdsa.VerifyHash(_emptyDigest, _ecdsaSignedHash);
}
Method Job Toolchain Mean Error StdDev Ratio
Rsa_VerifyHash Job-EBVYOU cngkey_cache 48.03 μs 0.140 μs 0.131 μs 0.38
Rsa_VerifyHash Job-QBWHWJ main 125.42 μs 0.505 μs 0.394 μs 1.00
Ecdsa_VerifyHash Job-EBVYOU cngkey_cache 220.13 μs 2.552 μs 2.731 μs 0.75
Ecdsa_VerifyHash Job-QBWHWJ main 295.14 μs 0.885 μs 0.739 μs 1.00

Open discussion

Technically this is a behavioral change, since it means that the CngKey.KeySize property will remain accessible on an otherwise disposed instance, just as long as it has already been fetched at least once prior to disposal. I could easily add a disposed check (and unit test to validate this behavior) if we think preserving the earlier behavior is worthwhile.

@ghost
Copy link

ghost commented Jul 27, 2023

Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

An internal partner is reporting that they're seeing CNG signature validation latency show up as a bottleneck. The culprit seems to be that each signing & validation operation queries the CngKey.KeySize property.

public CngKey Key
{
get
{
CngKey key = _core.GetOrGenerateKey(KeySize, CngAlgorithm.Rsa);
return key;
}

public CngKey GetOrGenerateKey(int keySize, CngAlgorithm algorithm)
{
ThrowIfDisposed();
// If our key size was changed, we need to generate a new key.
if (_lazyKey != null)
{
if (_lazyKey.KeySize != keySize)
DisposeKey();
}

The CngKey.KeySize property is not cached anywhere in the system, which means that each query to this property results in an LRPC call back into lsass or lsaiso. This context switch is what appears to be introducing the latency.

Since CngKey instances can't change their key size once created, we can cache the key size on our end without requiring calls back through the NCrypt* (lsass) layer. That significantly reduces the latency.

public class CngRunner
{
    private RSACng _rsa;
    private ECDsaCng _ecdsa;

    private static byte[] _emptyDigest = new byte[256 / 8];
    private byte[] _rsaSignedHash;
    private byte[] _ecdsaSignedHash;

    [GlobalSetup]
    public void Setup()
    {
        _rsa = new RSACng(2048);
        _ecdsa = new ECDsaCng(256);

        _rsaSignedHash = _rsa.SignHash(_emptyDigest, HashAlgorithmName.SHA256, RSASignaturePadding.Pss);
        if (!Rsa_VerifyHash()) { throw new CryptographicException("Self-test failed."); }

        _ecdsaSignedHash = _ecdsa.SignHash(_emptyDigest);
        if (!Ecdsa_VerifyHash()) { throw new CryptographicException("Self-test failed."); }
    }

    [Benchmark]
    public bool Rsa_VerifyHash() => _rsa.VerifyHash(_emptyDigest, _rsaSignedHash, HashAlgorithmName.SHA256, RSASignaturePadding.Pss);

    [Benchmark]
    public bool Ecdsa_VerifyHash() => _ecdsa.VerifyHash(_emptyDigest, _ecdsaSignedHash);
}
Method Job Toolchain Mean Error StdDev Ratio
Rsa_VerifyHash Job-EBVYOU cngkey_cache 48.03 μs 0.140 μs 0.131 μs 0.38
Rsa_VerifyHash Job-QBWHWJ main 125.42 μs 0.505 μs 0.394 μs 1.00
Ecdsa_VerifyHash Job-EBVYOU cngkey_cache 220.13 μs 2.552 μs 2.731 μs 0.75
Ecdsa_VerifyHash Job-QBWHWJ main 295.14 μs 0.885 μs 0.739 μs 1.00

Open discussion

Technically this is a behavioral change, since it means that the CngKey.KeySize property will remain accessible on an otherwise disposed instance, just as long as it has already been fetched at least once prior to disposal. I could easily add a disposed check (and unit test to validate this behavior) if we think preserving the earlier behavior is worthwhile.

Author: GrabYourPitchforks
Assignees: -
Labels:

area-System.Security, tenet-performance

Milestone: -

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Jul 28, 2023

Updated the PR to use -1 instead of 0 as a sentinel, plus moved some logic around to account for early return scenarios. I also added a unit test to ensure that all non-handle properties throw after the CngKey instance is disposed, even if the properties had been previously accessed and cached. This should help thwart behavioral regressions.

Note to reviewers: I suggest hiding whitespace changes (https://github.com/dotnet/runtime/pull/89599/files?diff=unified&w=1) when reviewing the latest iteration, since some indentation changed within the KeySize property getter.

Perf results from this build are nearly identical to the earlier run. I won't bother posting them.

@brianrob
Copy link
Member

A bit of justification here: This isn't a big deal for one off usages, but any time there is a system that is caching an instance for use over and over, the LRPC call is a significant bottleneck, and limits the scale of usage to be what lsass can handle. This is a common situation in cloud services, so a change to cache the key size is worthwile.

@vcsjones
Copy link
Member

@brianrob out of curiosity, are you using RSACng directly? Or are you using RSA.Create?

If the former, are you using CNG specific features that would exclude using RSA.Create?

@brianrob
Copy link
Member

@vcsjones, I'm one of the folks that was analyzing perf data, but don't work on the service itself. I've sent mail asking them to reply here.

@vcsjones
Copy link
Member

Gotcha. I only ask because starting in .NET 8, RSA.Create() will use bcrypt, not ncrypt, when possible. So all the RPC overhead goes away entirely.

@GrabYourPitchforks
Copy link
Member Author

Internal partner reports that they're using RSACertificateExtensions.GetRSAPublicKey extension method. On net8, this should go through bcrypt instead of ncrypt, so you're correct that the perf overhead should disappear.

I've changed the benchmark to perform signing instead of verification, since private key operations still require bouncing through lsass. The perf gains are still measurable: around 100μs per op.

Method Job Toolchain Mean Error StdDev Ratio
Rsa_SignHash Job-QLWHCM cngkey_cache 659.9 μs 5.05 μs 4.47 μs 0.88
Rsa_SignHash Job-OFFVFC main 748.6 μs 2.14 μs 1.90 μs 1.00
Ecdsa_SignHash Job-QLWHCM cngkey_cache 373.1 μs 2.74 μs 2.57 μs 0.78
Ecdsa_SignHash Job-OFFVFC main 479.1 μs 3.02 μs 2.52 μs 1.00

@vcsjones
Copy link
Member

The perf gains are still measurable: around 100μs per op.

Sure. I’m not trying to argue against the change - we should take it regardless since it has already demonstrated some improvements elsewhere. Just noting that the internal partner is likely to get an even more significant performance improvement for .NET 8 once they start going through bcrypt.

@bartonjs
Copy link
Member

bartonjs commented Jul 28, 2023

Really this shows that Windows should add similar caching. But even if they did this change would still save on the machinery of a P/Invoke.

@vcsjones
Copy link
Member

Really this shows that Windows should add similar caching.

Well... the key size can change - before NCryptFinalize is called. We (mostly) don't have that problem here. But I'm not so sure how reasonable it would be for Windows to add caching with that in mind.

@GrabYourPitchforks
Copy link
Member Author

Looks like CI is a bit clogged. I'll come back in a few hours and see how things have progressed before merging.

Copy link
Contributor

@IDisposable IDisposable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@GrabYourPitchforks GrabYourPitchforks merged commit 602b88d into dotnet:main Aug 1, 2023
@GrabYourPitchforks GrabYourPitchforks deleted the cngkey branch August 1, 2023 00:22
@ghost ghost locked as resolved and limited conversation to collaborators Aug 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants