[Instrumentation.Wcf] Allow WCF services to record exceptions if RecordException is set#2880
Conversation
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
|
||
| // OperationContext.Current *should* be reliable even in async calls at .NET 4.6.2+. | ||
| // In older versions it may not be. | ||
| var context = OperationContext.Current?.Extensions.FirstOrDefault(item => item is WcfOperationContext) as WcfOperationContext; |
There was a problem hiding this comment.
IExtensionCollection<T>.Find<E> would be more concise.
| activityListener.ExceptionRecorder += (Activity activity, Exception ex, ref TagList tags) => | ||
| { | ||
| recordedExceptions.Add(ex); | ||
| }; |
There was a problem hiding this comment.
I considered whether this ExceptionRecorder should check that activity.Source.Name is as expected. Just to verify that the exception is being recorded by the OpenTelemetry WCF instrumentation rather than any WCF built-in feature. However, if WCF somehow got such a feature, then the recordException: false test cases would fail. So it's not necessary to check the ActivitySource here.
There was a problem hiding this comment.
WCF in .NET Framework isn't adding anything like this. We were going to initially, but the goal posts kept moving on what telemetry was going to look like, and they keep changing in small subtle ways (evolving), so adding telemetry externally via this package is going to be the best place for it to exist for a long time. And the more time passes, the less interest there is in adding anything to .NET Framework as all the focus is on .NET.
|
|
||
| if (recordException && triggerException) | ||
| { | ||
| Assert.Collection(recordedExceptions, e => Assert.IsType<Exception>(e)); |
There was a problem hiding this comment.
OK, this verifies the type of the exception is exactly Exception as thrown from test/OpenTelemetry.Instrumentation.Wcf.Tests/WCF/Service.netfx.cs, not translated to any FaultException.
| if (WcfInstrumentationActivitySource.Options.RecordException) | ||
| { | ||
| OperationContext.Current?.Extensions.Add(new WcfOperationContext(activity)); | ||
| } |
There was a problem hiding this comment.
I have been doing something similar, and yes, OperationContext.Current will be more reliable for this use than Activity.Current, because when the WCF dispatcher pauses an operation and resumes another operation, it restores only OperationContext and not the whole ExecutionContext.
My implementation additionally has an ICallContextInitializer that finds the (already started) Activity from OperationContext.Current and sets it as Activity.Current. I added that because I suspected that throttling in the WCF dispatcher might mess up the AsyncLocal<T> states. However, I haven't yet tested whether that actually happens.
This OpenTelemetry WCF instrumentation doesn't seem to have such an ICallContextInitializer. Perhaps one should be added; but this PR does not make the problem worse.
There was a problem hiding this comment.
I believe you are correct that the throttles would mess up the AsyncLocal<T> state as it essentially maintains a list of continuations to run in order as throttles are released and the ExecutionContext is captured with context flow disabled. Definitely worth adding this as a follow up item.
d4421c8 to
7c66aaa
Compare
Co-authored-by: Piotr Kiełkowicz <pkiekowicz@splunk.com>
Co-authored-by: Piotr Kiełkowicz <pkiekowicz@splunk.com>
7c66aaa to
46659e6
Compare
Co-authored-by: Matt Hensley <130569+matt-hensley@users.noreply.github.com>
|
Should mention the feature around here, rather than say it "is available only on the client side": opentelemetry-dotnet-contrib/src/OpenTelemetry.Instrumentation.Wcf/README.md Lines 232 to 237 in 4448613 |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Done |
|
@kiel, sorry I missed the mention earlier. I'll take a look this week. |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
| // so the error appears after the Activity has completed. Additionally, sometimes | ||
| // the context has already been disposed or otherwise lost, preventing association | ||
| // at all. | ||
| // Also it becomes very difficult to unit-test, because there is no easy `ErrorsHandled` |
There was a problem hiding this comment.
This is perfectly fine to do here. HandleError allows you to prevent the session/context/connection from being aborted, and ProvideFault enable you to provde a custom fault message. Neither one of them are designed specifically for what you're doing, so there's no "right" place to do this. HanndleError is no more the right place than ProvideFault, so if this method works better due to the activity being stopped, then it's the right method to use.
| var context = OperationContext.Current?.Extensions.Find<WcfOperationContext>(); | ||
| var activity = context?.Activity ?? WcfInstrumentationActivitySource.ActivitySource.StartActivity(WcfInstrumentationActivitySource.UnassociatedExceptionActivityName, ActivityKind.Internal); | ||
|
|
||
| activity?.AddException(error); |
There was a problem hiding this comment.
Do you need the null coalescing here? I presume StartActivity is guaranteed to return a non-null activity, which means activity can't be null here. Same with the call to Stop below.
There was a problem hiding this comment.
I presume StartActivity is guaranteed to return a non-null activity
It's not. StartActivity returns null if the ActivitySource has no listeners or if no ActivityListener chooses to sample the activity.
|
I've approved the changes, but I don't have write access to this repo, so it's still blocked. |
RecordException is setRecordException is set
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2880 +/- ##
==========================================
+ Coverage 69.59% 69.76% +0.16%
==========================================
Files 396 388 -8
Lines 15906 15876 -30
==========================================
+ Hits 11070 11076 +6
+ Misses 4836 4800 -36
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Implements #2878
Changes
Adds an
IErrorHandlerimplementation (TracingErrorHandler) that gets added to theChannelDispatcherinstances for any instrumented services. If theRecordExceptionsoption is set, it associates the currentActivitywithOperationContext.Currentduring the initial message inspection. When an exception is triggered, theTracingErrorHandlerloads theActivityand adds the exception to it.Although the documentation specifically notes that the
HandleErrormethod should be used to perform activities such as logging, I'm performing the logging in theProvideFaultmethod. This is being done for reasons of timing--ProvideFaultis called before the message is returned to the client and therefore before theActivityis stopped inTelemetryDispatchMessageInspector.BeforeSendReply. This allows the exception to be properly placed inside the recorded span, prevents it from being attached to an already completedActivity, and ensures theOperationContextwith the relevant information still exists.It also simplifies the testing considerably, because I have not been able to find an event that only fires after all
HandleErrorimplementations have been called, so I'm forced to simply rely on predefined waits to see if the error is recorded.As far as I am aware, the biggest potential issue with using
ProvideFaultis it will block completing the request and sending the response to the client. I don't expect that to be an issue here, but it will rely on the good behavior of theActivityListener.ExceptionRecorderimplementations.Merge requirement checklist
CHANGELOG.mdfiles updated for non-trivial changes