Skip to content

Add telemetry to System.Net.Security #40108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 14, 2020
Merged

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Jul 29, 2020

Contributes to #37428

Adds HandshakeStart, HandshakeStop, HandshakeFailed events.

Adds counters:

  • tls-handshakes-per-second
  • total-tls-handshakes
  • current-tls-handshakes
  • failed-tls-handshakes
  • all-tls-sessions-open
  • tls10-sessions-open
  • tls11-sessions-open
  • tls12-sessions-open
  • tls13-sessions-open
  • all-tls-handshake-duration
  • tls10-handshake-duration
  • tls11-handshake-duration
  • tls12-handshake-duration
  • tls13-handshake-duration

@MihaZupan MihaZupan added this to the 5.0.0 milestone Jul 29, 2020
@MihaZupan MihaZupan requested review from brianrob, noahfalk, wfurt, alnikola and a team July 29, 2020 23:06
@ghost
Copy link

ghost commented Jul 29, 2020

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

descrs[0] = new EventData
{
DataPointer = (IntPtr)(&arg1),
Size = sizeof(int) // EventSource defines bool as a 32-bit type
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right way of using bools in event data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/dotnet/runtime/blob/master/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventProvider.cs#L867 for the right way to encode bool types. Also, I believe that initlocals is not set for this assembly, so you will need to explicitly set EventData.Reserved=0. A non-zero value will cause failures that will result in the events being corrupted.

Given that the input types are all supported types, you could delete this entire method and just let EventSource handle it for you. It's OK to do this, but there's the potential for more bugs here.

Also, given that these are

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I believe that initlocals is not set for this assembly, so you will need to explicitly set EventData.Reserved=0. A non-zero value will cause failures that will result in the events being corrupted.

Reserved is not visible outside of its assembly. Using new EventData will zero-init that field, regardless of SkipLocalsInit.

Given that the input types are all supported types, you could delete this entire method and just let EventSource handle it for you.

Considering these events are only fired 2/3x per connection, the perf overhead likely wouldn't matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point - you're using the constructor here, so you should be fine.

{
Interlocked.Increment(ref _startedTlsHandshakes);

if (IsEnabled(EventLevel.Informational, EventKeywords.None))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can safely remove the IsEnabled checks inside these functions. They will occur just inside of WriteEvent. Arguments have already been created, so there isn't much perf to save here, especially if WriteEvent gets inlined.

descrs[0] = new EventData
{
DataPointer = (IntPtr)(&arg1),
Size = sizeof(int) // EventSource defines bool as a 32-bit type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/dotnet/runtime/blob/master/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventProvider.cs#L867 for the right way to encode bool types. Also, I believe that initlocals is not set for this assembly, so you will need to explicitly set EventData.Reserved=0. A non-zero value will cause failures that will result in the events being corrupted.

Given that the input types are all supported types, you could delete this entire method and just let EventSource handle it for you. It's OK to do this, but there's the potential for more bugs here.

Also, given that these are

@MihaZupan
Copy link
Member Author

/azp run runtime-libraries outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

private PollingCounter? _totalTlsHandshakesCounter;
private PollingCounter? _currentTlsHandshakesCounter;
private PollingCounter? _failedTlsHandshakesCounter;
private PollingCounter? _connectionsOpenTls10Counter;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep this per protocol version? That is going hard to maintain IMHO. (and not covering old SSL)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, counters can't carry other data, so I don't know if this can be simplified

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key part is if ween stats per protocol. If we do, we have no option IMHO. In that case it may be worth of adding deprecating Ssl2/3 for completeness. And perhaps add some comment somewhere to update this when next TLS is out. I did not pay much attention. but do we could http1 and http2 separate? From SslStream prospective there is really not that much difference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're suggesting we add counters for Ssl2/3 as well even tho they're deprecated?

And perhaps add some comment somewhere to update this when next TLS is out.

Will do, we can also add asserts for that.

but do we could http1 and http2 separate? From SslStream prospective there is really not that much difference.

Do you mean have separate counters for open connections depending on whether they are used for Http1/Http2?
Afaik SslStream doesn't have any such data right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it does not. I was wondering if HTTP counters are also separated by protocol versions.
I was wondering why somebody would want to track handshakes by protocol version.
The answer may be monitoring of very old or very new versions to manage deprecation and adoption.
If we don't add deprecated Ssl, than the counters will not add-up to total number for handshakes.
That may be fine if we document it or if we add some "other" bucket.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP counters aren't split by protocol version, the RequestStart event does log the version however.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't add deprecated Ssl, than the counters will not add-up to total number for handshakes.

There is currently no counter for "total-opened-sessions" (though we can add it as it seems useful), there are only per-protocol ones (excluding Ssl2, 3).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And perhaps add some comment somewhere to update this when next TLS is out.

I added an assert that should catch that

There is currently no counter for "total-opened-sessions"

I added all-sessions-open and all-handshake-duration counters.

My understanding is that in most cases, a user would have to manually opt-in to using Ssl2/Ssl3, so I would be okay with not adding counters for those. They are still tracked in the all-* counters, so documenting it seems better.

Copy link
Member

@wfurt wfurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good. I left few comments.

Copy link
Member

@wfurt wfurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@MihaZupan
Copy link
Member Author

/azp run runtime-libraries outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MihaZupan MihaZupan closed this Aug 12, 2020
@MihaZupan MihaZupan reopened this Aug 12, 2020
@MihaZupan
Copy link
Member Author

/azp run runtime-libraries outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

DisplayName = "Total TLS handshakes failed"
};

_sessionsOpenCounter ??= new PollingCounter("all-sessions-open", this, () => Interlocked.Read(ref _sessionsOpen))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_sessionsOpenCounter ??= new PollingCounter("all-sessions-open", this, () => Interlocked.Read(ref _sessionsOpen))
_sessionsOpenCounter ??= new PollingCounter("tls-sessions-open", this, () => Interlocked.Read(ref _sessionsOpen))

'all-sessions' seems ambiguous - I assume there are many other kinds of authenticated communication sessions even if .NET doesn't implement them (Kerberos, SSL, SSH?). Everywhere else we used 'tls' so can we use it here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it includes tls and ssl2, ssl3. (same for comments below).

Using tls in that case would be less misleading than all. I'm not aware of a different term that would be more appropriate here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using tls in that case would be less misleading than al

In context I am guessing you meant more misleading? : ) What about "TLS/SSL Sessions Active"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "TLS/SSL Sessions Active"?

As the description, yes. I suppose that's enough clarification even if we use the tls-* event name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tls-ssl-sessions-open?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does drag along all "generic" counters:
tls-handshakes-per-second
total-tls-handshakes
current-tls-handshakes
failed-tls-handshakes

to
tls-ssl-handshakes-per-second
total-tls-ssl-handshakes
current-tls-ssl-handshakes
failed-tls-ssl-handshakes

@wfurt Do you have a preference here regarding "tls-" vs "tls-ssl-" vs "all-" naming?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use "tls-". We use either one but I have not seem them together like this. Possibly something like "all-tls-sessions-open".
To @noahfalk 's point. We may add some telemetry to NegotiatStream/Kerberos.

@MihaZupan
Copy link
Member Author

/azp run runtime-libraries outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MihaZupan MihaZupan merged commit 47345cd into dotnet:master Aug 14, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants