-
Notifications
You must be signed in to change notification settings - Fork 4.5k
stats/opentelemetry: record retry attempts from clientStream #8342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
stats/opentelemetry: record retry attempts from clientStream #8342
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8342 +/- ##
==========================================
- Coverage 82.48% 82.48% -0.01%
==========================================
Files 413 413
Lines 40518 40546 +28
==========================================
+ Hits 33422 33445 +23
- Misses 5738 5743 +5
Partials 1358 1358
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember we had separate tests for retries. This change should only affect that test. This change shouldn't affect tests which are doing only single attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinothkumarr227 have you tested this to ensure its working correctly? I was under impression that TestTraceSpan_WithRetriesAndNameResolutionDelay will need changes for expected values. I think its not working as expected because you are not setting the count back to ctx after incrementing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, adding a second reviewer. Please address the minor comment.
stats/opentelemetry/e2e_test.go
Outdated
@@ -1548,12 +1547,12 @@ const delayedResolutionEventName = "Delayed name resolution complete" | |||
// TestTraceSpan_WithRetriesAndNameResolutionDelay verifies that | |||
// "Delayed name resolution complete" event is recorded in the call trace span | |||
// only once if any of the retry attempt encountered a delay in name resolution | |||
func (s) TestTraceSpan_WithRetriesAndNameResolutionDelay(t *testing.T) { | |||
func TestTraceSpan_WithRetriesAndNameResolutionDelay(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the (s)
back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@vinothkumarr227 there is a failure in the updated test reported by PTAL and ensure the test is not flaky. |
Sure, I’ll check it out. |
|
stats/opentelemetry/trace.go
Outdated
span.SetAttributes( | ||
attribute.Bool("Client", rs.Client), | ||
attribute.Bool("FailFast", rs.FailFast), | ||
attribute.Int64("previous-rpc-attempts", int64(ai.previousRPCAttempts)), | ||
// TODO: Remove "previous-rpc-attempts" and "transparent-retry" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a TODO?
Isn't this simply?
attribs := []attribute.KeyValue{...Bool("Client", rs.Client), /* etc */}
if ai.previousRPCAttempts != nil {
attribs = append(attribs, attribute.Int64("prev...", ..), ...)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Per @arjan-bal recommendation, this will be handled in a separate PR, which is why it's a TODO. I've created an issue for reference: #8430
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why. If these attributes are not supposed to be present, then wouldn't it be incorrect to include them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Doug. I've made changes and pushed the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it was already setting these things?
Also what is this about "Go was always setting these attributes even though they aren't in the spec"?
Are we intentionally not implementing the spec? Why are we mentioning the OC spec in the OT plugin? I'm kind of lost here, TBH.
Fixes: #8299
RELEASE NOTES:
grpc.previous-rpc-attempts
) are now recorded as span attributes for non-transparent client retries.