Skip to content

Convert WCF client instrumentation to a BindingElement#1247

Merged
Kielek merged 14 commits into
open-telemetry:mainfrom
repl-chris:WcfClientBinding
Aug 11, 2023
Merged

Convert WCF client instrumentation to a BindingElement#1247
Kielek merged 14 commits into
open-telemetry:mainfrom
repl-chris:WcfClientBinding

Conversation

@repl-chris
Copy link
Copy Markdown
Contributor

@repl-chris repl-chris commented Jun 22, 2023

Fixes #478
Fixes #812
Fixes #1153

Changes

This converts the WCF client instrumentation to being implemented in a lower-level bindingelement/channels rather than in the higher-level endpointbehavior/clientmessageinspector. This is needed so we can gain more insight into asynchronous execution flows and exception handling so we can better manage the ExecutionContexts. The clientmessageinspector still exists, but only to provide high-level context to the channels where the actual instrumentation now executes.

For significant contributions please make sure you have completed the following items:

  • Appropriate CHANGELOG.md updated for non-trivial changes
  • Design discussion issue #
  • Changes in public API reviewed

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Jun 22, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@github-actions github-actions Bot requested a review from CodeBlanch June 22, 2023 04:28
@repl-chris repl-chris force-pushed the WcfClientBinding branch 2 times, most recently from 7671259 to 0e4ed12 Compare June 22, 2023 14:49
@repl-chris repl-chris changed the title Convert WCF client instrumentation to a BindingElement instead EndpointBehavior Convert WCF client instrumentation to a BindingElement instead of an IEndpointBehavior Jun 22, 2023
@repl-chris repl-chris marked this pull request as ready for review June 22, 2023 14:50
@repl-chris repl-chris requested a review from a team June 22, 2023 14:50
@Kielek Kielek added the comp:instrumentation.wcf Things related to OpenTelemetry.Instrumentation.Wcf label Jun 22, 2023
@Kielek
Copy link
Copy Markdown
Member

Kielek commented Jun 23, 2023

@repl-chris, please sing EasyCLA. Withot this we cannot proceed with this PR.

@repl-chris
Copy link
Copy Markdown
Contributor Author

@Kielek yep, it is coming - it's working its way through corporate red tape as we speak 😀....I don't expect it will take long.

@repl-chris
Copy link
Copy Markdown
Contributor Author

@Kielek ok good to go 👍

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jul 1, 2023

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions Bot added the Stale label Jul 1, 2023
@repl-chris
Copy link
Copy Markdown
Contributor Author

@CodeBlanch @Kielek Not meaning to rush, but is there any specific procedure for PRs? I just want to make sure I haven't missed a step or anything...

I realize it's a fairly large PR - I tried to keep the changes minimal but still functionally complete, but obviously it's a fairly fundamental change. I could take a different approach if that would be preferable - I could do multiple smaller PRs adding the new components while keeping backwards compatibility before finally dropping the old mechanism?

@github-actions github-actions Bot removed the Stale label Jul 6, 2023
@github-actions
Copy link
Copy Markdown
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@mconnew
Copy link
Copy Markdown

mconnew commented Jul 20, 2023

I haven't gone over the PR thoroughly, but wanted to give a bit of feedback. Using a BindingElement is going to be the most powerful way to achieve what's needed as you can get at layers that are otherwise really difficult to do. You can still keep the user experience based around adding a behavior without requiring them to modify their binding themselves. In the validate method of a behavior, you can get to the ServiceEndpoint and modify the binding. You would do this by creating a CustomBinding from the existing binding, add the custom BindingElement into the stack, then replace the Binding on the ServiceEndpoint with the new CustomBinding.

There are some things to consider when using a binding element. It matters where you place it in the stack. If you put it just after the Transport and MessageEncoder, you will have access to the raw SOAP message, but there's 2 things to be aware of. If they are using SecurityMode.Message, it's possible you might not be able to read the headers as they might be encrypted. I believe the way you are adding the custom header that it probably won't be, but it's something to validate. The second thing is SOAP messages do map one to one with service calls. You will be exposed to all the infrastructure messages too such as reliable sessions keep alive pings, or session establishment handshaking. You may or may not want to have that logged. For example, there is some benefit to recording the rate of session establishment, or being aware of how often a reliable sessions is reestablished. This are both indicators of service health, but might be non-obvious on how to interpret that data, and it might need special handling. If you put it at the end of the stack, you will see messages intended to be dispatched. That will still capture some session establishment calls as internally it's implemented as another service implementation, but you won't see Reliable Sessions data at the end.

Where you put the binding element will affect what the time taken to handle a request looks like. For example, if you are using security token authentication without using a session, every request needs to be authenticated. The authentication is done by the SecurityBindingElement. This authentication might involve an off-box call (eg SQL server lookup) and consume a not insignificant amount of time. If you place your binding element between the transport and the security binding elements, the request processing time you'd see would include the authentication time. If you placed your binding element at the end of the list, it won't capture that authentication timing. If something happens with authentication and it slows down and you are measuring from after authentication, you miss that in your metrics.

Basically I'm saying you need to think carefully about where in the binding stack to place the binding element, or whether you want to insert multiple binding elements to capture more fine grained information. Do you want to emit data for non dispatching infrastructure messages?

@github-actions github-actions Bot removed the Stale label Jul 21, 2023
@repl-chris
Copy link
Copy Markdown
Contributor Author

@mconnew oooh thanks for the tip about replacing the binding on the serviceendpoint, I will definitely look into that.

As for the placement of the binding element in the stack...that's some really good food for thought - I was thinking it should always go on the top (as far away from the transport as possible). This telemetry is measuring the time to make an outgoing service call...I'd expect we'd want all the client-side WCF framework and extensions to be part of that, not simply just the network request (or a shade-of-grey between the 2 extremes). Come to think of it, it would be nice to include all other endpointbehaviors inside of this too - logically they are also part of the "outgoing call"...I will play with this. The transport already has its own instrumentation (at least when using http), and from a design perspective I'd expect if there are other elements of interest in the stack they should have instrumentation installed specifically for them....those could eventually also be part of this module, but for this initial implementation I would expect this component to just comprise as much of the WCF client-side stack as possible.

@repl-chris repl-chris changed the title Convert WCF client instrumentation to a BindingElement instead of an IEndpointBehavior Convert WCF client instrumentation to a BindingElement Jul 21, 2023
@repl-chris
Copy link
Copy Markdown
Contributor Author

I have reworked it to inject the new bindingelement from the endpointbehavior as suggested, and it works beautifully. It cleans it up a bunch - now there are no external api or config changes (yay!).

There's 1 specific area that I'm not very happy with....I wonder if you have any suggestions for a better approach. In the InstrumentedDuplexSessionChannel.cs I start the activity on Send() and stop it on Receive(), but I have no good way to tie the 2 together - no way to pass any context from the Send call to the Receive call, so no easy way to know which activity to stop. To make it work I'm manually tracking the outstanding requests (RequestTelemetryStateTracker.cs) so I can look up the appropriate request after receiving the response. Of course this only works reliably when I actually receive a response - any error on Receive() will cause the "outstanding requests" collection to accumulate entries (it'll hold max 1000 of them before it starts discarding them). It's a little gross, but I still see this as a good step forward compared to where we are today, and of course it's just on the duplex channel - the request channel has no such hackery....I hope there's a better way 🤔

@mconnew
Copy link
Copy Markdown

mconnew commented Jul 22, 2023

The outgoing message has an ID in a special SOAP header. I'm on mobile right now so can't easily look it up but it should be easy to spot, it's a Guid. I think the header name is MessageID. The reply message will have another header I believe is named RelatesTo which contains the same Guid as the matching outgoing message. This is how WCF matches the up. We have a RequestReplyCorrelator which the outgoing message ID is added to before sending. Then we call a method on the correlator which returns (and when async, completes a pending async call) when a message is received with the matching RelatesTo header. We have a timeout mechanism which removes the outgoing ID from the correlator if the reply hasn't been seen without a timeout period. If a message is actually received after that timeout, the code drops the message as the matching ID is missing. I believe that timeout should be the Binding.SendTimeout. Any correlation you build should have a timeout mechanism and consider the call as failed if the reply isn't received within that timeout. The caller would have seen a TimeoutException. Anything received after that timeout should be treated as a failure. I don't know if you want to emit telemetry when that happens as you should have already emitted something saying it timed out, it would be useful info in a limited set of situations. That would generally happen if the server is configured with a longer timeout than the client.


internal static void ApplyBindingElementToServiceEndpoint(ServiceEndpoint endpoint)
{
if (endpoint.Binding is CustomBinding customBinding && customBinding.Elements.Find<TelemetryBindingElement>() != null)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this runs on the service side too, you should probably also skip if endpoint.IsSystemEndpoint is true.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it won't currently ever run on the service-side, but it doesn't hurt to have it there 👍


internal sealed class TelemetryContextMessageProperty
{
public const string Name = "telemetryContext";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only my personal preference, but I like to use the entire fully qualified class name as the property name to make sure there aren't collisions with some other library deciding to use the same propery name.

@mconnew
Copy link
Copy Markdown

mconnew commented Jul 31, 2023

This looks good to me. From a correctness perspective everything is being done the right "WCF" way.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Aug 8, 2023

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions Bot added the Stale label Aug 8, 2023
@Kielek Kielek removed the Stale label Aug 8, 2023
@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 8, 2023

Codecov Report

Merging #1247 (d0e1ed9) into main (01378f4) will increase coverage by 0.21%.
The diff coverage is 81.89%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1247      +/-   ##
==========================================
+ Coverage   74.15%   74.36%   +0.21%     
==========================================
  Files         256      269      +13     
  Lines        9348     9621     +273     
==========================================
+ Hits         6932     7155     +223     
- Misses       2416     2466      +50     
Files Changed Coverage Δ
...entation.Wcf/TelemetryContractBehaviorAttribute.cs 0.00% <0.00%> (ø)
.../Implementation/InstrumentedCommunicationObject.cs 53.57% <53.57%> (ø)
...Implementation/InstrumentedDuplexSessionChannel.cs 64.70% <64.70%> (ø)
...n.Wcf/Implementation/InstrumentedRequestChannel.cs 75.00% <75.00%> (ø)
...y.Instrumentation.Wcf/TelemetryEndpointBehavior.cs 92.10% <75.00%> (-7.90%) ⬇️
...mplementation/InstrumentedRequestChannelFactory.cs 80.00% <80.00%> (ø)
...ntation/InstrumentedDuplexSessionChannelFactory.cs 83.33% <83.33%> (ø)
...cf/Implementation/AsyncResultWithTelemetryState.cs 85.71% <85.71%> (ø)
...tion.Wcf/Implementation/TelemetryBindingElement.cs 92.85% <92.85%> (ø)
...Wcf/Implementation/ClientChannelInstrumentation.cs 94.80% <94.80%> (ø)
... and 6 more

@Kielek
Copy link
Copy Markdown
Member

Kielek commented Aug 9, 2023

@repl-chris, could you please fix dotnet-format and markdownlint issues?

Comment thread src/OpenTelemetry.Instrumentation.Wcf/README.md Outdated
@repl-chris
Copy link
Copy Markdown
Contributor Author

@Kielek all fixed up 👍


namespace OpenTelemetry.Instrumentation.Wcf.Implementation;

internal class InstrumentedChannel : InstrumentedCommunicationObject, IChannel
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would you please add sealed to these internal classes. There is a slight perf boost for sealed classes. See: dotnet/runtime#49944

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long comment, but I think this is a useful nugget of information to spread around.

I don't think this would apply here as the call site of the virtual methods would be WCF code which has no visibility into the concrete type. The situation where you get the performance improvement is this:

public abstract class FooBase
{
    public abstract void DoWork()
}

public class MyFoo : FooBase
{
    public override void DoWork()
    {
        Console.WriteLine("Doing some work");
    }
}

public static class Program
{
    public static int Main(string[] args)
    {
        MyFoo foo = GetFoo();
        foo.DoWork();
    }
    public static MyFoo GetFoo()
    {
        return new MyFoo();
    }
}

In the snippet above, the code in Main method which calls GetFoo() doesn't have knowledge about if the MyFoo instance returned is an actual MyFoo instance as it could potentially be a type derived from MyFoo. As the DoWork() method is virtual, it could potentially have been overridden in a derived instance so it must do a virtual method call to make sure the correct method is called. If MyFoo was sealed, then you are guaranteed that the object returned from GetFoo() is an actual MyFoo type as you can't derive it. This means the call to DoWork() can be a non-virtual method call to MyFoo.DoWork() as there's no possibility that a different method implementation would get called. The non-virtual method call is faster. In this case it would probably inline the implementation as MyFoo.DoWork() is so small which gives you even more performance improvement.

For completeness and accuracy, I ignored that the compiler might be able to still do it in this case as GetFoo() is non-virtual and short so could get inlined in which case the compiler might reason about the type sufficiently to do a non-virtual method call here. The point was about semantics and if GetFoo() was longer so couldn't be inlined, or virtual (it's static in this code, but generalizing here), then what I stated would be correct. Hopefully you understand the concept I'm explaining in the general case.

In this codebase, the instrumented channel implements multiple WCF interfaces. The caller for all the virtual overridden methods (or interface implemented methods) is WCF which only has a reference to the instance as the interface type. There is no a priori knowledge when WCF was compiled about the type of the channel so the compiler can only emit virtual method calls.

You only benefit from sealing an implementation if the call site has a reference typed as the concrete sealed type.

On the other hand, marking it as sealed doesn't do any harm. Although not applicable in this scenario, sealing a class can also prevent using Ref.Emit to derive an implementation and using reflection to inject it to modify an app to do things not intended.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added sealed to all possible internal classes 👍

Comment thread src/OpenTelemetry.Instrumentation.Wcf/README.md
@github-actions github-actions Bot requested a review from CodeBlanch August 10, 2023 04:35
Copy link
Copy Markdown
Member

@CodeBlanch CodeBlanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for the help on this @mconnew

@github-actions github-actions Bot requested a review from CodeBlanch August 10, 2023 17:46
@Kielek Kielek merged commit e4cb523 into open-telemetry:main Aug 11, 2023
@julealgon
Copy link
Copy Markdown

Curious to know when we can expect a new version of the package with this fix in place. I'd like to use it right away!

@Kielek / @repl-chris , any idea?

@repl-chris repl-chris deleted the WcfClientBinding branch August 11, 2023 16:45
@utpilla
Copy link
Copy Markdown
Contributor

utpilla commented Aug 14, 2023

Curious to know when we can expect a new version of the package with this fix in place. I'd like to use it right away!

@Kielek / @repl-chris , any idea?

@julealgon This change is available in 1.0.0-rc.11 version of the package: https://www.nuget.org/packages/OpenTelemetry.Instrumentation.Wcf/1.0.0-rc.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:instrumentation.wcf Things related to OpenTelemetry.Instrumentation.Wcf

Projects

None yet

7 participants