Convert WCF client instrumentation to a BindingElement by repl-chris · Pull Request #1247 · open-telemetry/opentelemetry-dotnet-contrib

repl-chris · 2023-06-22T04:28:27Z

Fixes #478
Fixes #812
Fixes #1153

Changes

This converts the WCF client instrumentation to being implemented in a lower-level bindingelement/channels rather than in the higher-level endpointbehavior/clientmessageinspector. This is needed so we can gain more insight into asynchronous execution flows and exception handling so we can better manage the ExecutionContexts. The clientmessageinspector still exists, but only to provide high-level context to the channels where the actual instrumentation now executes.

For significant contributions please make sure you have completed the following items:

Appropriate CHANGELOG.md updated for non-trivial changes
Design discussion issue #
Changes in public API reviewed

linux-foundation-easycla · 2023-06-22T04:28:31Z

The committers listed above are authorized under a signed CLA.

✅ login: repl-chris / name: Chris Redekop (4afeee3, 5b69a71, b93b955, bf6880b, f46be3b, fcbed9c, aa3b8a9, 06ad35f, bb144e2, 6d80014, 3c3e751)
✅ login: Kielek / name: Piotr Kiełkowicz (e028a48, 04cc8c1, d0e1ed9)

Kielek · 2023-06-23T04:22:09Z

@repl-chris, please sing EasyCLA. Withot this we cannot proceed with this PR.

repl-chris · 2023-06-23T05:37:49Z

@Kielek yep, it is coming - it's working its way through corporate red tape as we speak 😀....I don't expect it will take long.

repl-chris · 2023-06-23T20:29:51Z

@Kielek ok good to go 👍

github-actions · 2023-07-01T03:24:18Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

repl-chris · 2023-07-05T07:21:11Z

@CodeBlanch @Kielek Not meaning to rush, but is there any specific procedure for PRs? I just want to make sure I haven't missed a step or anything...

I realize it's a fairly large PR - I tried to keep the changes minimal but still functionally complete, but obviously it's a fairly fundamental change. I could take a different approach if that would be preferable - I could do multiple smaller PRs adding the new components while keeping backwards compatibility before finally dropping the old mechanism?

github-actions · 2023-07-16T03:23:50Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

mconnew · 2023-07-20T21:29:14Z

I haven't gone over the PR thoroughly, but wanted to give a bit of feedback. Using a BindingElement is going to be the most powerful way to achieve what's needed as you can get at layers that are otherwise really difficult to do. You can still keep the user experience based around adding a behavior without requiring them to modify their binding themselves. In the validate method of a behavior, you can get to the ServiceEndpoint and modify the binding. You would do this by creating a CustomBinding from the existing binding, add the custom BindingElement into the stack, then replace the Binding on the ServiceEndpoint with the new CustomBinding.

There are some things to consider when using a binding element. It matters where you place it in the stack. If you put it just after the Transport and MessageEncoder, you will have access to the raw SOAP message, but there's 2 things to be aware of. If they are using SecurityMode.Message, it's possible you might not be able to read the headers as they might be encrypted. I believe the way you are adding the custom header that it probably won't be, but it's something to validate. The second thing is SOAP messages do map one to one with service calls. You will be exposed to all the infrastructure messages too such as reliable sessions keep alive pings, or session establishment handshaking. You may or may not want to have that logged. For example, there is some benefit to recording the rate of session establishment, or being aware of how often a reliable sessions is reestablished. This are both indicators of service health, but might be non-obvious on how to interpret that data, and it might need special handling. If you put it at the end of the stack, you will see messages intended to be dispatched. That will still capture some session establishment calls as internally it's implemented as another service implementation, but you won't see Reliable Sessions data at the end.

Where you put the binding element will affect what the time taken to handle a request looks like. For example, if you are using security token authentication without using a session, every request needs to be authenticated. The authentication is done by the SecurityBindingElement. This authentication might involve an off-box call (eg SQL server lookup) and consume a not insignificant amount of time. If you place your binding element between the transport and the security binding elements, the request processing time you'd see would include the authentication time. If you placed your binding element at the end of the list, it won't capture that authentication timing. If something happens with authentication and it slows down and you are measuring from after authentication, you miss that in your metrics.

Basically I'm saying you need to think carefully about where in the binding stack to place the binding element, or whether you want to insert multiple binding elements to capture more fine grained information. Do you want to emit data for non dispatching infrastructure messages?

repl-chris · 2023-07-21T03:44:17Z

@mconnew oooh thanks for the tip about replacing the binding on the serviceendpoint, I will definitely look into that.

As for the placement of the binding element in the stack...that's some really good food for thought - I was thinking it should always go on the top (as far away from the transport as possible). This telemetry is measuring the time to make an outgoing service call...I'd expect we'd want all the client-side WCF framework and extensions to be part of that, not simply just the network request (or a shade-of-grey between the 2 extremes). Come to think of it, it would be nice to include all other endpointbehaviors inside of this too - logically they are also part of the "outgoing call"...I will play with this. The transport already has its own instrumentation (at least when using http), and from a design perspective I'd expect if there are other elements of interest in the stack they should have instrumentation installed specifically for them....those could eventually also be part of this module, but for this initial implementation I would expect this component to just comprise as much of the WCF client-side stack as possible.

… than in a ClientMessageInspector

repl-chris · 2023-07-21T22:25:13Z

I have reworked it to inject the new bindingelement from the endpointbehavior as suggested, and it works beautifully. It cleans it up a bunch - now there are no external api or config changes (yay!).

There's 1 specific area that I'm not very happy with....I wonder if you have any suggestions for a better approach. In the InstrumentedDuplexSessionChannel.cs I start the activity on Send() and stop it on Receive(), but I have no good way to tie the 2 together - no way to pass any context from the Send call to the Receive call, so no easy way to know which activity to stop. To make it work I'm manually tracking the outstanding requests (RequestTelemetryStateTracker.cs) so I can look up the appropriate request after receiving the response. Of course this only works reliably when I actually receive a response - any error on Receive() will cause the "outstanding requests" collection to accumulate entries (it'll hold max 1000 of them before it starts discarding them). It's a little gross, but I still see this as a good step forward compared to where we are today, and of course it's just on the duplex channel - the request channel has no such hackery....I hope there's a better way 🤔

…icatable

mconnew · 2023-07-22T19:24:22Z

The outgoing message has an ID in a special SOAP header. I'm on mobile right now so can't easily look it up but it should be easy to spot, it's a Guid. I think the header name is MessageID. The reply message will have another header I believe is named RelatesTo which contains the same Guid as the matching outgoing message. This is how WCF matches the up. We have a RequestReplyCorrelator which the outgoing message ID is added to before sending. Then we call a method on the correlator which returns (and when async, completes a pending async call) when a message is received with the matching RelatesTo header. We have a timeout mechanism which removes the outgoing ID from the correlator if the reply hasn't been seen without a timeout period. If a message is actually received after that timeout, the code drops the message as the matching ID is missing. I believe that timeout should be the Binding.SendTimeout. Any correlation you build should have a timeout mechanism and consider the call as failed if the reply isn't received within that timeout. The caller would have seen a TimeoutException. Anything received after that timeout should be treated as a failure. I don't know if you want to emit telemetry when that happens as you should have already emitted something saying it timed out, it would be useful info in a limited set of situations. That would generally happen if the server is configured with a longer timeout than the client.

mconnew · 2023-07-31T18:24:07Z


+    internal static void ApplyBindingElementToServiceEndpoint(ServiceEndpoint endpoint)
+    {
+        if (endpoint.Binding is CustomBinding customBinding && customBinding.Elements.Find<TelemetryBindingElement>() != null)


If this runs on the service side too, you should probably also skip if endpoint.IsSystemEndpoint is true.

it won't currently ever run on the service-side, but it doesn't hurt to have it there 👍

mconnew · 2023-07-31T18:26:32Z

+
+internal sealed class TelemetryContextMessageProperty
+{
+    public const string Name = "telemetryContext";


This is only my personal preference, but I like to use the entire fully qualified class name as the property name to make sure there aren't collisions with some other library deciding to use the same propery name.

mconnew · 2023-07-31T18:29:16Z

This looks good to me. From a correctness perspective everything is being done the right "WCF" way.

github-actions · 2023-08-08T03:15:49Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

codecov · 2023-08-08T05:20:43Z

Codecov Report

Merging #1247 (d0e1ed9) into main (01378f4) will increase coverage by 0.21%.
The diff coverage is 81.89%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1247      +/-   ##
==========================================
+ Coverage   74.15%   74.36%   +0.21%     
==========================================
  Files         256      269      +13     
  Lines        9348     9621     +273     
==========================================
+ Hits         6932     7155     +223     
- Misses       2416     2466      +50

Files Changed	Coverage Δ
...entation.Wcf/TelemetryContractBehaviorAttribute.cs	`0.00% <0.00%> (ø)`
.../Implementation/InstrumentedCommunicationObject.cs	`53.57% <53.57%> (ø)`
...Implementation/InstrumentedDuplexSessionChannel.cs	`64.70% <64.70%> (ø)`
...n.Wcf/Implementation/InstrumentedRequestChannel.cs	`75.00% <75.00%> (ø)`
...y.Instrumentation.Wcf/TelemetryEndpointBehavior.cs	`92.10% <75.00%> (-7.90%)`	⬇️
...mplementation/InstrumentedRequestChannelFactory.cs	`80.00% <80.00%> (ø)`
...ntation/InstrumentedDuplexSessionChannelFactory.cs	`83.33% <83.33%> (ø)`
...cf/Implementation/AsyncResultWithTelemetryState.cs	`85.71% <85.71%> (ø)`
...tion.Wcf/Implementation/TelemetryBindingElement.cs	`92.85% <92.85%> (ø)`
...Wcf/Implementation/ClientChannelInstrumentation.cs	`94.80% <94.80%> (ø)`
... and 6 more

Kielek · 2023-08-09T05:24:46Z

@repl-chris, could you please fix dotnet-format and markdownlint issues?

repl-chris · 2023-08-09T15:18:10Z

@Kielek all fixed up 👍

CodeBlanch · 2023-08-09T16:58:35Z

+
+namespace OpenTelemetry.Instrumentation.Wcf.Implementation;
+
+internal class InstrumentedChannel : InstrumentedCommunicationObject, IChannel


nit: Would you please add sealed to these internal classes. There is a slight perf boost for sealed classes. See: dotnet/runtime#49944

Sorry for the long comment, but I think this is a useful nugget of information to spread around.

I don't think this would apply here as the call site of the virtual methods would be WCF code which has no visibility into the concrete type. The situation where you get the performance improvement is this:

public abstract class FooBase { public abstract void DoWork() } public class MyFoo : FooBase { public override void DoWork() { Console.WriteLine("Doing some work"); } } public static class Program { public static int Main(string[] args) { MyFoo foo = GetFoo(); foo.DoWork(); } public static MyFoo GetFoo() { return new MyFoo(); } }

In the snippet above, the code in Main method which calls GetFoo() doesn't have knowledge about if the MyFoo instance returned is an actual MyFoo instance as it could potentially be a type derived from MyFoo. As the DoWork() method is virtual, it could potentially have been overridden in a derived instance so it must do a virtual method call to make sure the correct method is called. If MyFoo was sealed, then you are guaranteed that the object returned from GetFoo() is an actual MyFoo type as you can't derive it. This means the call to DoWork() can be a non-virtual method call to MyFoo.DoWork() as there's no possibility that a different method implementation would get called. The non-virtual method call is faster. In this case it would probably inline the implementation as MyFoo.DoWork() is so small which gives you even more performance improvement.

For completeness and accuracy, I ignored that the compiler might be able to still do it in this case as GetFoo() is non-virtual and short so could get inlined in which case the compiler might reason about the type sufficiently to do a non-virtual method call here. The point was about semantics and if GetFoo() was longer so couldn't be inlined, or virtual (it's static in this code, but generalizing here), then what I stated would be correct. Hopefully you understand the concept I'm explaining in the general case.

In this codebase, the instrumented channel implements multiple WCF interfaces. The caller for all the virtual overridden methods (or interface implemented methods) is WCF which only has a reference to the instance as the interface type. There is no a priori knowledge when WCF was compiled about the type of the channel so the compiler can only emit virtual method calls.

You only benefit from sealing an implementation if the call site has a reference typed as the concrete sealed type.

On the other hand, marking it as sealed doesn't do any harm. Although not applicable in this scenario, sealing a class can also prevent using Ref.Emit to derive an implementation and using reflection to inject it to modify an app to do things not intended.

I've added sealed to all possible internal classes 👍

…telemetry-dotnet-contrib into WcfClientBinding

CodeBlanch

LGTM thanks for the help on this @mconnew

julealgon · 2023-08-11T13:00:29Z

Curious to know when we can expect a new version of the package with this fix in place. I'd like to use it right away!

@Kielek / @repl-chris , any idea?

utpilla · 2023-08-14T22:05:00Z

Curious to know when we can expect a new version of the package with this fix in place. I'd like to use it right away!

@Kielek / @repl-chris , any idea?

@julealgon This change is available in 1.0.0-rc.11 version of the package: https://www.nuget.org/packages/OpenTelemetry.Instrumentation.Wcf/1.0.0-rc.11

github-actions Bot requested a review from CodeBlanch June 22, 2023 04:28

repl-chris force-pushed the WcfClientBinding branch 2 times, most recently from 7671259 to 0e4ed12 Compare June 22, 2023 14:49

repl-chris changed the title ~~Convert WCF client instrumentation to a BindingElement instead EndpointBehavior~~ Convert WCF client instrumentation to a BindingElement instead of an IEndpointBehavior Jun 22, 2023

repl-chris marked this pull request as ready for review June 22, 2023 14:50

repl-chris requested a review from a team June 22, 2023 14:50

repl-chris force-pushed the WcfClientBinding branch from 0e4ed12 to 3d2d4e8 Compare June 22, 2023 15:05

github-actions Bot assigned CodeBlanch Jun 22, 2023

Kielek added the comp:instrumentation.wcf Things related to OpenTelemetry.Instrumentation.Wcf label Jun 22, 2023

CodeBlanch mentioned this pull request Jun 26, 2023

OpenTelemetry.Instrumentation.Wcf supported transport protocols and framework #1242

Open

github-actions Bot added the Stale label Jul 1, 2023

github-actions Bot removed the Stale label Jul 6, 2023

github-actions Bot added the Stale label Jul 16, 2023

repl-chris mentioned this pull request Jul 18, 2023

[wcf] Add support for non-SOAP based payloads #1251

Merged

3 tasks

github-actions Bot removed the Stale label Jul 21, 2023

Move WCF client instrumentation down to a lower-level binding, rather…

4afeee3

… than in a ClientMessageInspector

repl-chris force-pushed the WcfClientBinding branch from e1bf1ad to 4afeee3 Compare July 21, 2023 21:07

repl-chris changed the title ~~Convert WCF client instrumentation to a BindingElement instead of an IEndpointBehavior~~ Convert WCF client instrumentation to a BindingElement Jul 21, 2023

Make performance of RequestTelemetryStateTracker more consistent/pred…

5b69a71

…icatable

mconnew reviewed Jul 31, 2023

View reviewed changes

mconnew approved these changes Jul 31, 2023

View reviewed changes

Fully qualify telemetry context message property name

06ad35f

github-actions Bot added the Stale label Aug 8, 2023

Kielek removed the Stale label Aug 8, 2023

Merge branch 'main' into WcfClientBinding

e028a48

lachmatt approved these changes Aug 8, 2023

View reviewed changes

Kielek reviewed Aug 9, 2023

View reviewed changes

Comment thread src/OpenTelemetry.Instrumentation.Wcf/README.md Outdated

Fix up formatting/linting issues

bb144e2

CodeBlanch reviewed Aug 9, 2023

View reviewed changes

Comment thread src/OpenTelemetry.Instrumentation.Wcf/README.md

Merge branch 'main' into WcfClientBinding

04cc8c1

github-actions Bot requested a review from CodeBlanch August 10, 2023 04:35

repl-chris added 2 commits August 9, 2023 23:32

Add 'sealed' to a few internal classes

6d80014

Merge branch 'WcfClientBinding' of https://github.com/repl-chris/open…

3c3e751

…telemetry-dotnet-contrib into WcfClientBinding

Kielek approved these changes Aug 10, 2023

View reviewed changes

CodeBlanch approved these changes Aug 10, 2023

View reviewed changes

Merge branch 'main' into WcfClientBinding

d0e1ed9

github-actions Bot requested a review from CodeBlanch August 10, 2023 17:46

Kielek merged commit e4cb523 into open-telemetry:main Aug 11, 2023

repl-chris deleted the WcfClientBinding branch August 11, 2023 16:45

utpilla mentioned this pull request Aug 25, 2023

Flaky unit test on net462 #1325

Closed


		namespace OpenTelemetry.Instrumentation.Wcf.Implementation;

		internal class InstrumentedChannel : InstrumentedCommunicationObject, IChannel

Conversation

repl-chris commented Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

linux-foundation-easycla Bot commented Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kielek commented Jun 23, 2023

Uh oh!

repl-chris commented Jun 23, 2023

Uh oh!

repl-chris commented Jun 23, 2023

Uh oh!

github-actions Bot commented Jul 1, 2023

Uh oh!

repl-chris commented Jul 5, 2023

Uh oh!

github-actions Bot commented Jul 16, 2023

Uh oh!

mconnew commented Jul 20, 2023

Uh oh!

repl-chris commented Jul 21, 2023

Uh oh!

repl-chris commented Jul 21, 2023

Uh oh!

mconnew commented Jul 22, 2023

Uh oh!

mconnew Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

repl-chris Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

mconnew Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

mconnew commented Jul 31, 2023

Uh oh!

github-actions Bot commented Aug 8, 2023

Uh oh!

codecov Bot commented Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Kielek commented Aug 9, 2023

Uh oh!

Uh oh!

repl-chris commented Aug 9, 2023

Uh oh!

CodeBlanch Aug 9, 2023

Choose a reason for hiding this comment

Uh oh!

mconnew Aug 9, 2023

Choose a reason for hiding this comment

Uh oh!

repl-chris Aug 10, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CodeBlanch left a comment

Choose a reason for hiding this comment

Uh oh!

julealgon commented Aug 11, 2023

Uh oh!

utpilla commented Aug 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

repl-chris commented Jun 22, 2023 •

edited

Loading

linux-foundation-easycla Bot commented Jun 22, 2023 •

edited

Loading

codecov Bot commented Aug 8, 2023 •

edited

Loading