Skip to content

Traces missing/changed with WebFluxSleuthOperators.withSpanInScope in cloud-gateway #2237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vineetkala11 opened this issue Nov 30, 2022 · 14 comments
Labels
Milestone

Comments

@vineetkala11
Copy link

Describe the bug

Version details
spring-cloud-gateway: 3.1.4
spring-cloud-sleuth: 3.1.4

Issue
Recently we identified an issue in cloud-gateway where we found traces and spans are missing in log at multiple places. After investigation we found its due to spring.sleuth.reactor.instrumentation-type is explicitly set to manual for cloud-gateway and this mode suggest to handle tracingContext manually by using WebFluxSleuthOperators, as given in this document and issue

This issue is not appearing with other spring boot applications because spring.sleuth.reactor.instrumentation-type is set to its default value DECORATE_ON_EACH, which guarantee to wraps every Reactor operator in a trace representation.

We changed our code as per suggested by document and wrap all our loggers within WebFluxSleuthOperators.withSpanInScope, as given below.

WebFluxSleuthOperators.withSpanInScope(tracing, currentTraceContext, exchange) {
                                    logger.debug("hello world ...")
                                }

and we found very less occurrence of missing traces and spans in logs :)

But after deep analysis of log we found, traceId and spaneId are different at some places (for the single route), where we calling some external api (through RetrofitClient) from cloud-gateway, and I believe its due missing tracingContext at the time of external call and due to this new traceId and spanId get generated. This behavior is misleading for someone who using tools like kibana to search log based on traceId, additionally in some of the cases traceId and spanId are still missing for the libraries by spring-cloud-gateway

As we used WebFluxSleuthOperators in our application, but we didn't have any control over external libraries like RetrofitClient, so how we need to handle such scenario.

Sample
Below screenshot where we are getting different traceId and spanId for single route call.

image

Below screenshot where we are not getting any traceId and spanId

image

Just want to add that by changing spring.sleuth.reactor.instrumentation-type to DECORATE_ON_EACH everything works as expected.

@marcingrzejszczak
Copy link
Contributor

Can you check the latest snapshots (3.1.6-SNAPSHOT) cause this should be fixed by a6e1dcb

@vineetkala11
Copy link
Author

Hello @marcingrzejszczak your quick reply is much appreciated.

can you please share maven repo for snapshot (3.1.6-SNAPSHOT), so that I can point it from my project.

@marcingrzejszczak
Copy link
Contributor

Sure

<repositories>
			<repository>
				<id>spring-snapshots</id>
				<name>Spring Snapshots</name>
				<url>https://repo.spring.io/snapshot</url>
				<snapshots>
					<enabled>true</enabled>
				</snapshots>
				<releases>
					<enabled>false</enabled>
				</releases>
			</repository>
			<!-- BRAVE  -->
			<repository>
				<id>jfrog-snapshots</id>
				<name>JFrog Snapshots</name>
				<url>https://oss.jfrog.org/oss-snapshot-local/</url>
				<snapshots>
					<enabled>true</enabled>
				</snapshots>
				<releases>
					<enabled>false</enabled>
				</releases>
			</repository>
			<repository>
				<id>spring-milestones</id>
				<name>Spring Milestones</name>
				<url>https://repo.spring.io/milestone</url>
				<snapshots>
					<enabled>false</enabled>
				</snapshots>
			</repository>
			<repository>
				<id>spring-releases</id>
				<name>Spring Releases</name>
				<url>https://repo.spring.io/release</url>
				<snapshots>
					<enabled>false</enabled>
				</snapshots>
			</repository>
		</repositories>

@vineetkala11
Copy link
Author

Hello @marcingrzejszczak

As suggested, we tried with 3.1.6-SNAPSHOT, but behavior is same as earlier, traceId and spanId changed for external calls for the same route.

Used 3.1.6-SNAPSHOT

image

image

Application log

image

@marcingrzejszczak
Copy link
Contributor

Is this using the queue decorator wrapping?

@vineetkala11
Copy link
Author

If I understand you correctly, you taking about what value is set for spring.sleuth.reactor.instrumentation-type in application. So we are using manual decorator.

@marcingrzejszczak
Copy link
Contributor

Please try turning on the on queue wrapping and try again

@vineetkala11
Copy link
Author

We tried with DECORATE_QUEUES, with this decorator we didn't found any variation in traceId. So our application's current state is, we are using DECORATE_QUEUES decorator, along with the WebFluxSleuthOperators (to wrap our logger). With this combination we didn't found missing traces in log and traceId also not changing with external call.

But we are not very comfortable with the usage of queue decorate because it claim low performance impact. However manual decorator is the recommended one, for better performance.

Also we did some test few days back with all decorators, here is our observation with different decorators

DECORATE_QUEUES : no variation of traceId found with external calls, but at few places traceId and spanId is missing
DECORATE_ON_LAST: same behavior as with DECORATE_QUEUES
DECORATE_ON_EACH: no missing traces and no variation in traceId with external calls
MANUAL: too many traces are missing and new traceId generated with external calls

With above result we can say that DECORATE_ON_EACH has the perfect behavior and we are trying to achieve same behavior with MANULA decorator.

@vineetkala11
Copy link
Author

For integrity of Tracing and low performance impact, we are looking to go with DECORATE_QUEUES along with the WebFluxSleuthOperators (where we found occurrence of missing tarceId and spaneId).

@marcingrzejszczak what is your view on this.

@marcingrzejszczak
Copy link
Contributor

@OlegDokuka is working on a solution that should fix it for future versions of Reactor. Until now we have introduced changes in Sleuth Reactor integration that should fix this problem. Can you check the latest snapshots to see if the decorate_on_each option works better for you now with SC Gateway?

@spring-cloud-issues
Copy link

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@vineetkala11
Copy link
Author

vineetkala11 commented Jan 19, 2023

Thanks @marcingrzejszczak for the feedback.

We are in process to migrate to spring boot 3, do you think this will also fix this issue, as it use micrometer tracing internally.

@marcingrzejszczak
Copy link
Contributor

We are working on that.

@marcingrzejszczak
Copy link
Contributor

With Spring Cloud Sleuth 3.1.6 this should be fixed when using queue wrapping mechanism

@marcingrzejszczak marcingrzejszczak added this to the 3.1.6 milestone Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants