Convert WCF client instrumentation to a BindingElement#1247
Conversation
7671259 to
0e4ed12
Compare
0e4ed12 to
3d2d4e8
Compare
|
@repl-chris, please sing EasyCLA. Withot this we cannot proceed with this PR. |
|
@Kielek yep, it is coming - it's working its way through corporate red tape as we speak 😀....I don't expect it will take long. |
|
@Kielek ok good to go 👍 |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
@CodeBlanch @Kielek Not meaning to rush, but is there any specific procedure for PRs? I just want to make sure I haven't missed a step or anything... I realize it's a fairly large PR - I tried to keep the changes minimal but still functionally complete, but obviously it's a fairly fundamental change. I could take a different approach if that would be preferable - I could do multiple smaller PRs adding the new components while keeping backwards compatibility before finally dropping the old mechanism? |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
I haven't gone over the PR thoroughly, but wanted to give a bit of feedback. Using a BindingElement is going to be the most powerful way to achieve what's needed as you can get at layers that are otherwise really difficult to do. You can still keep the user experience based around adding a behavior without requiring them to modify their binding themselves. In the validate method of a behavior, you can get to the ServiceEndpoint and modify the binding. You would do this by creating a CustomBinding from the existing binding, add the custom BindingElement into the stack, then replace the Binding on the ServiceEndpoint with the new CustomBinding. There are some things to consider when using a binding element. It matters where you place it in the stack. If you put it just after the Transport and MessageEncoder, you will have access to the raw SOAP message, but there's 2 things to be aware of. If they are using SecurityMode.Message, it's possible you might not be able to read the headers as they might be encrypted. I believe the way you are adding the custom header that it probably won't be, but it's something to validate. The second thing is SOAP messages do map one to one with service calls. You will be exposed to all the infrastructure messages too such as reliable sessions keep alive pings, or session establishment handshaking. You may or may not want to have that logged. For example, there is some benefit to recording the rate of session establishment, or being aware of how often a reliable sessions is reestablished. This are both indicators of service health, but might be non-obvious on how to interpret that data, and it might need special handling. If you put it at the end of the stack, you will see messages intended to be dispatched. That will still capture some session establishment calls as internally it's implemented as another service implementation, but you won't see Reliable Sessions data at the end. Where you put the binding element will affect what the time taken to handle a request looks like. For example, if you are using security token authentication without using a session, every request needs to be authenticated. The authentication is done by the SecurityBindingElement. This authentication might involve an off-box call (eg SQL server lookup) and consume a not insignificant amount of time. If you place your binding element between the transport and the security binding elements, the request processing time you'd see would include the authentication time. If you placed your binding element at the end of the list, it won't capture that authentication timing. If something happens with authentication and it slows down and you are measuring from after authentication, you miss that in your metrics. Basically I'm saying you need to think carefully about where in the binding stack to place the binding element, or whether you want to insert multiple binding elements to capture more fine grained information. Do you want to emit data for non dispatching infrastructure messages? |
|
@mconnew oooh thanks for the tip about replacing the binding on the serviceendpoint, I will definitely look into that. As for the placement of the binding element in the stack...that's some really good food for thought - I was thinking it should always go on the top (as far away from the transport as possible). This telemetry is measuring the time to make an outgoing service call...I'd expect we'd want all the client-side WCF framework and extensions to be part of that, not simply just the network request (or a shade-of-grey between the 2 extremes). Come to think of it, it would be nice to include all other endpointbehaviors inside of this too - logically they are also part of the "outgoing call"...I will play with this. The transport already has its own instrumentation (at least when using http), and from a design perspective I'd expect if there are other elements of interest in the stack they should have instrumentation installed specifically for them....those could eventually also be part of this module, but for this initial implementation I would expect this component to just comprise as much of the WCF client-side stack as possible. |
… than in a ClientMessageInspector
e1bf1ad to
4afeee3
Compare
|
I have reworked it to inject the new bindingelement from the endpointbehavior as suggested, and it works beautifully. It cleans it up a bunch - now there are no external api or config changes (yay!). There's 1 specific area that I'm not very happy with....I wonder if you have any suggestions for a better approach. In the |
|
The outgoing message has an ID in a special SOAP header. I'm on mobile right now so can't easily look it up but it should be easy to spot, it's a Guid. I think the header name is MessageID. The reply message will have another header I believe is named RelatesTo which contains the same Guid as the matching outgoing message. This is how WCF matches the up. We have a RequestReplyCorrelator which the outgoing message ID is added to before sending. Then we call a method on the correlator which returns (and when async, completes a pending async call) when a message is received with the matching RelatesTo header. We have a timeout mechanism which removes the outgoing ID from the correlator if the reply hasn't been seen without a timeout period. If a message is actually received after that timeout, the code drops the message as the matching ID is missing. I believe that timeout should be the Binding.SendTimeout. Any correlation you build should have a timeout mechanism and consider the call as failed if the reply isn't received within that timeout. The caller would have seen a TimeoutException. Anything received after that timeout should be treated as a failure. I don't know if you want to emit telemetry when that happens as you should have already emitted something saying it timed out, it would be useful info in a limited set of situations. That would generally happen if the server is configured with a longer timeout than the client. |
|
|
||
| internal static void ApplyBindingElementToServiceEndpoint(ServiceEndpoint endpoint) | ||
| { | ||
| if (endpoint.Binding is CustomBinding customBinding && customBinding.Elements.Find<TelemetryBindingElement>() != null) |
There was a problem hiding this comment.
If this runs on the service side too, you should probably also skip if endpoint.IsSystemEndpoint is true.
There was a problem hiding this comment.
it won't currently ever run on the service-side, but it doesn't hurt to have it there 👍
|
|
||
| internal sealed class TelemetryContextMessageProperty | ||
| { | ||
| public const string Name = "telemetryContext"; |
There was a problem hiding this comment.
This is only my personal preference, but I like to use the entire fully qualified class name as the property name to make sure there aren't collisions with some other library deciding to use the same propery name.
|
This looks good to me. From a correctness perspective everything is being done the right "WCF" way. |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1247 +/- ##
==========================================
+ Coverage 74.15% 74.36% +0.21%
==========================================
Files 256 269 +13
Lines 9348 9621 +273
==========================================
+ Hits 6932 7155 +223
- Misses 2416 2466 +50
|
|
@repl-chris, could you please fix dotnet-format and markdownlint issues? |
|
@Kielek all fixed up 👍 |
|
|
||
| namespace OpenTelemetry.Instrumentation.Wcf.Implementation; | ||
|
|
||
| internal class InstrumentedChannel : InstrumentedCommunicationObject, IChannel |
There was a problem hiding this comment.
nit: Would you please add sealed to these internal classes. There is a slight perf boost for sealed classes. See: dotnet/runtime#49944
There was a problem hiding this comment.
Sorry for the long comment, but I think this is a useful nugget of information to spread around.
I don't think this would apply here as the call site of the virtual methods would be WCF code which has no visibility into the concrete type. The situation where you get the performance improvement is this:
public abstract class FooBase
{
public abstract void DoWork()
}
public class MyFoo : FooBase
{
public override void DoWork()
{
Console.WriteLine("Doing some work");
}
}
public static class Program
{
public static int Main(string[] args)
{
MyFoo foo = GetFoo();
foo.DoWork();
}
public static MyFoo GetFoo()
{
return new MyFoo();
}
}In the snippet above, the code in Main method which calls GetFoo() doesn't have knowledge about if the MyFoo instance returned is an actual MyFoo instance as it could potentially be a type derived from MyFoo. As the DoWork() method is virtual, it could potentially have been overridden in a derived instance so it must do a virtual method call to make sure the correct method is called. If MyFoo was sealed, then you are guaranteed that the object returned from GetFoo() is an actual MyFoo type as you can't derive it. This means the call to DoWork() can be a non-virtual method call to MyFoo.DoWork() as there's no possibility that a different method implementation would get called. The non-virtual method call is faster. In this case it would probably inline the implementation as MyFoo.DoWork() is so small which gives you even more performance improvement.
For completeness and accuracy, I ignored that the compiler might be able to still do it in this case as GetFoo() is non-virtual and short so could get inlined in which case the compiler might reason about the type sufficiently to do a non-virtual method call here. The point was about semantics and if GetFoo() was longer so couldn't be inlined, or virtual (it's static in this code, but generalizing here), then what I stated would be correct. Hopefully you understand the concept I'm explaining in the general case.
In this codebase, the instrumented channel implements multiple WCF interfaces. The caller for all the virtual overridden methods (or interface implemented methods) is WCF which only has a reference to the instance as the interface type. There is no a priori knowledge when WCF was compiled about the type of the channel so the compiler can only emit virtual method calls.
You only benefit from sealing an implementation if the call site has a reference typed as the concrete sealed type.
On the other hand, marking it as sealed doesn't do any harm. Although not applicable in this scenario, sealing a class can also prevent using Ref.Emit to derive an implementation and using reflection to inject it to modify an app to do things not intended.
There was a problem hiding this comment.
I've added sealed to all possible internal classes 👍
CodeBlanch
left a comment
There was a problem hiding this comment.
LGTM thanks for the help on this @mconnew
|
Curious to know when we can expect a new version of the package with this fix in place. I'd like to use it right away! @Kielek / @repl-chris , any idea? |
@julealgon This change is available in |
Fixes #478
Fixes #812
Fixes #1153
Changes
This converts the WCF client instrumentation to being implemented in a lower-level bindingelement/channels rather than in the higher-level endpointbehavior/clientmessageinspector. This is needed so we can gain more insight into asynchronous execution flows and exception handling so we can better manage the ExecutionContexts. The clientmessageinspector still exists, but only to provide high-level context to the channels where the actual instrumentation now executes.
For significant contributions please make sure you have completed the following items:
CHANGELOG.mdupdated for non-trivial changes