Skip to content

Provide a setting for automatic context propagation with reactor-core #34201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #35776
chemicL opened this issue Feb 15, 2023 · 37 comments
Closed
Tracked by #35776

Provide a setting for automatic context propagation with reactor-core #34201

chemicL opened this issue Feb 15, 2023 · 37 comments
Assignees
Labels
theme: observability Issues related to observability type: enhancement A general enhancement
Milestone

Comments

@chemicL
Copy link
Member

chemicL commented Feb 15, 2023

reactor-core 3.5.3 introduced a way to automatically propagate ThreadLocal values registered with the context-propagation library. Combined with the use of micrometer-tracing it can allow users to log anywhere in their reactive chains with tracing information present, without the need to use handle/tap operators.

reactor/reactor-core#3335 adds a static method: Hooks.enableAutomaticContextPropagation() that can be triggered at the start of an application to enable this feature.

The call to this method can be handled by Spring Boot and users would just interact with a Spring Boot property to enable/disable this feature.

@chemicL
Copy link
Member Author

chemicL commented Feb 15, 2023

Important note is that the above feature is dependant on updating the dependencies for micrometer and context-propagation as listed in Micrometer docs:

  • Micrometer Context-Propagation 1.0.2
  • Micrometer 1.10.4
  • Micrometer Tracing 1.0.2

@bclozel bclozel self-assigned this Feb 15, 2023
@bclozel bclozel added type: bug A general bug theme: observability Issues related to observability labels Feb 15, 2023
@bclozel bclozel added this to the 3.0.3 milestone Feb 15, 2023
@bclozel bclozel removed the status: waiting-for-triage An issue we've not yet triaged label Feb 15, 2023
@bclozel
Copy link
Member

bclozel commented Feb 15, 2023

@chemicL If I understand correctly, without this setting context propagation is only managed with the tap/handle operators, or by using contextCapture. Using this hook provided by reactor turns it on for all operators.

I'm wondering if we should create a spring.reactor.context-propagation accepting enum values such as AUTO (default) and MANUAL. This would leave us some room for possible new modes in the future. What do you think?

@chemicL
Copy link
Member Author

chemicL commented Feb 15, 2023

Currently, as long as context-propagation library is on the classpath, reactor-core does the following (and there's no way to turn it off unless context-propagation is excluded from the runtime):

  • for tap and handle it restores the ThreadLocals from Reactor Context
  • contextCapture writes all ThreadLocal values to Reactor Context under the same keys for which ThreadLocalAccessors are registered

When Hooks.enableAutomaticContextPropagation() is called, the following changes:

  • all operators upstream from a contextWrite() or contextCapture() will have ThreadLocal values associated with keys present in Reactor Context propagated
  • handle and tap won't do anything special for that as the values are already present

In both scenarios, contextCapture is required to store current ThreadLocals into Reactor Context if any were written to. That won't be necessary e.g. when returning a Flux or Mono to which Spring subscribes where it attaches the Observation before calling an endpoint in case Micrometer is used (and the Observation is not even in current ThreadLocal).

I agree the enum approach is flexible and can let us introduce different mechanisms if we come up with any. For the naming, I'd go with AUTO and LIMITED instead of MANUAL. Intuitively, I'd expect MANUAL to mean direct use of the context-propagation library API to restore ThreadLocals.

@kzander91
Copy link
Contributor

@bclozel Could it make sense to call Hooks.disableAutomaticContextPropagation() on context shutdown and/or when ContextPropagationMode != AUTO?
I was wondering if this particular solution could otherwise impact other Spring Context's running in the same JVM, for example during tests.

@bclozel
Copy link
Member

bclozel commented Feb 20, 2023

Could it make sense to call Hooks.disableAutomaticContextPropagation() on context shutdown and/or when ContextPropagationMode != AUTO?
I was wondering if this particular solution could otherwise impact other Spring Context's running in the same JVM, for example during tests.

As far as I understand, this is a static call, so by definition this applies to the entire JVM. Removing this queue wrapper during context shutdown or when the mode is not set to "AUTO" is just likely to disable the feature for other contexts that are still running and require it.

This feature is additive, the performance cost has been measured and should not impact applications significantly. Unless tests are relying on the fact that ThreadLocal values are not restored within reactor operators, this configuration change should not influence other behaviors.

@wilkinsona
Copy link
Member

wilkinsona commented Feb 20, 2023

Given the Micrometer requirement for the hook to have an effect, I wonder if this should be auto-configured in spring-boot-actuator-autoconfigure rather than spring-boot-autoconfigure. I also wonder if the property prefix should also reflect that it is related to tracing. spring.reactor.tracing.context-propagation perhaps?

@wilkinsona wilkinsona reopened this Feb 20, 2023
@wilkinsona wilkinsona added the for: team-meeting An issue we'd like to discuss as a team to make progress label Feb 20, 2023
@wilkinsona
Copy link
Member

It looks like we're currently relying on context-propagation being in testCompileClasspath through spring-integration-core. When spring-projects/spring-integration#8556 is merged, that'll no longer be the case and the tests won't compile.

@wilkinsona
Copy link
Member

I've pushed cae8c14 to work around the compile failure following the merge of the PR in Spring Integration. We should see if we can remove the need for it as part of finishing off this issue.

@wilkinsona
Copy link
Member

We discussed this today and decided that we need more time to make sure we've got this right. We're going to revert the changes for now and revisit this for a subsequent 3.0.x release. In the meantime, a manual call to Hooks.enableAutomaticContextPropagation() can be made in apps that are affected by the problem described in #33372.

@wilkinsona
Copy link
Member

The changes have been reverted in 88de3cc.

@davidmelia
Copy link

@wilkinsona I am prototyping Hooks.enableAutomaticContextPropagation() and have noticed that integrations such as Spring Data Mongo do not propagate the trace ID - I assume this is because it hands off to a different even loop?

@rhanton
Copy link

rhanton commented Feb 24, 2023

@wilkinsona Darn, was waiting for this to be released to help fix some trace propagation issues I was seeing when using micrometer for internal tracing (I presume this will fix my internal @Observed traces not picking up the external trace id when using spring-reactive). I will keep waiting with bated breath!

@wilkinsona
Copy link
Member

No need to wait. Just add a call to Hooks.enableAutomaticContextPropagation() somewhere in your app.

@mikhilsanghvi
Copy link

@chemicL I've tried with reactor-core 3.5.5 and I see similar issue spring-data-r2dbc where the trace ID is no propagated back. As a result I am forced to add .contextWrite(Function.identity()) after the repository method like

repository.findAll().contextWrite(Function.identity());

I was hoping the issue reactor/reactor-core#3366 would have resolved this but unfortunately not.

Appreciate any help. Thanks

@chemicL
Copy link
Member Author

chemicL commented Jun 20, 2023

@mikhilsanghvi I'm always eager to look into an isolated reproducer. If you prepare one, please feel free to open an issue in reactor-core repository so I can analyse and see what can be done about the problem.

@chemicL
Copy link
Member Author

chemicL commented Jun 20, 2023

@janekbettinger Please report the problem you observe as a new issue in reactor-core. I'll try to analyse it and if necessary will redirect to the appropriate repository if the issue lies elsewhere.

@singhbaljit
Copy link

I'm seeing an issue with Spring Security where using Hooks.enableAutomaticContextPropagation() with micrometer-tracing-bridge-* library causes the OAuth2 resource servers to break. See spring-projects/spring-security#13431.

@wilkinsona wilkinsona removed the for: team-meeting An issue we'd like to discuss as a team to make progress label Jun 28, 2023
@wilkinsona wilkinsona modified the milestones: 3.0.x, 3.x Jun 28, 2023
@wilkinsona wilkinsona added type: enhancement A general enhancement and removed type: bug A general bug labels Jun 28, 2023
@philwebb philwebb modified the milestones: 3.x, 3.2.x Sep 28, 2023
@mhalbritter mhalbritter self-assigned this Sep 29, 2023
@mhalbritter
Copy link
Contributor

After some discussion with the Reactor team, we decided:

  • We add the property spring.reactor.context-propagation with the values AUTO and LIMITED. It defaults to LIMITED.

Reasoning:

having a boot properly already is a big win for us and gives users the visibility that such a feature exists

if the user launches an app expecting to see trace ids in logs, they can look up boot docs and see the setting, win

if we enable that by default for users who have not relied on that functionality, we automatically degrade their performance - investigating the reasons for that can take more time as they don’t know what is at fault, so that’s an argument against turning that on by default

We need to add something to the tracing docs that users have to set this to AUTO to get trace and span IDs in the logs.

@oleh-hudyma
Copy link

@mhalbritter where can we read about performance degradation when using context propagation?

@Abahafart
Copy link

there is any solution to print span and trace id in webclient methods? I've the same behavior with Spring boot version 3.1.3

@mhalbritter
Copy link
Contributor

where can we read about performance degradation when using context propagation?

The reactor team is doing some benchmarks and I guess will publish them when they are done.

@chemicL
Copy link
Member Author

chemicL commented Oct 10, 2023

We have been evaluating the configurations using some selected scenarios, but sharing any numbers would be counter-productive. The performance impact is a fact, as any such solution interacts with ThreadLocal values, accessing which is quite costly when thinking about squeezing the most out of the hardware. However, it is an accepted compromise, for which you get production grade observability features without having to do manual work. What we worked on is first and foremost presenting a solution which is not as impactful as restoring ThreadLocals for each operator (which preserves correctness) but is not causing ThreadLocal leaks like some approaches taken with Sleuth. We worked on providing a solution which is safe to use and comfortable. However, it comes with non-trivial cost, which can only be evaluated on a per-use-case scenario. So best to take your app and measure the impact with load tests - what happens with the default (limited) mode, what happens without any observability, and what happens if you enable the automatic mode. In some cases the differences will not be noticeable if the app is doing something more significant per request than those TL accesses. On another occasion it can happen that the automatic mode is cheaper than the default mode (e.g. when aggregating a stream of data), while in other cases it can be more costly. So the only recommendation we can give is to test for yourself if this is the price you're ok to pay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme: observability Issues related to observability type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests