-
Notifications
You must be signed in to change notification settings - Fork 817
Be consistent with span logging, set error=true and use otlog.Error #2970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Isn’t this better solved by consistently using spanlogger that we use elsewhere? |
Good point, looks like spanlogger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cstyan Thanks for fixing this. I left few comments.
Could you fix the linter, please? For the error Error return value of
log.Error is not checked (errcheck)
you could just add it to the ignore list defined in .errcheck-exclude
(we don't want to check errors from logger)
|
||
sp, _ := opentracing.StartSpanFromContext(ctx, "ParseQueryRangeResponse") | ||
defer sp.Finish() | ||
log, _ := spanlogger.New(ctx, "ParseQueryRangeResponse") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New span here. Should replace the context, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm basing this off how we used the ot library sp, _ := opentracing.StartSpanFromContext(ctx, "ParseQueryRangeResponse")
, where we ignored the returned context. I don't know the historical reason behind doing it that way, so if you think it should change here with the introduction of spanlogger that's fine by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason is that ctx
is not used below within this function. I believe replacing it is more future proof to avoid mistakes (we should do it once we'll start using ctx
below and it's easy to forget to do it).
@@ -171,14 +170,14 @@ func verifySamples(log *spanlogger.SpanLogger, tc Case, pairs []model.SamplePair | |||
expectedNumSamples := int(duration / cfg.ScrapeInterval) | |||
if !epsilonCorrect(float64(len(pairs)), float64(expectedNumSamples), cfg.samplesEpsilon) { | |||
level.Error(log).Log("msg", "wrong number of samples", "expected", expectedNumSamples, "actual", len(pairs)) | |||
log.LogFields(otlog.Error(fmt.Errorf("wrong number of samples"))) | |||
log.Error(fmt.Errorf("wrong value")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the previous message "wrong number of samples" more correct? It's also what we log.
return false | ||
} | ||
} else { | ||
expectedNumSamples := int(duration / cfg.ScrapeInterval) | ||
if math.Abs(float64(expectedNumSamples-len(pairs))) > 2 { | ||
level.Error(log).Log("msg", "wrong number of samples", "expected", expectedNumSamples, "actual", len(pairs)) | ||
log.LogFields(otlog.Error(fmt.Errorf("wrong number of samples"))) | ||
log.Error(fmt.Errorf("wrong value")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here
Thanks for the review @pracucci, I've replied to the comments that are still open, let me know what you think. |
@pracucci can you let me know your preferences re: my replies to your comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cstyan for patiently addressing my feedback 🙏 Your comment about logging is right. I left few minor comments and then we should be good to go.
pkg/chunk/cache/memcached.go
Outdated
|
||
// Memcached returns partial results even on error. | ||
if err != nil { | ||
sp.LogFields(otlog.Error(err)) | ||
level.Error(c.logger).Log("msg", "Failed to get keys from memcached", "err", err) | ||
log.Error(err) //nolint:errcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we shouldn't repeat this nolint
comment wherever we use log.Error()
but add it to .errcheck-exclude
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't get that to work properly, for various variations of (import path.Struct).Function
. If you have a suggestion for what to use as the entry in that file let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the latest commit as well, if we want to have the ctx
assignment in the bigtable index client for example, we need nolint comments again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (*github.com/cortexproject/cortex/pkg/util/spanlogger.SpanLogger).Error
should work (it does with standalone errcheck utility). When I update golangci-lint
in the build image to 1.30.0, I no longer see this warning, even without changing .errcheck-exclude. 😕
We could also remove error return value from this method, since no production code uses it.
|
||
sp, _ := opentracing.StartSpanFromContext(ctx, "ParseQueryRangeResponse") | ||
defer sp.Finish() | ||
log, _ := spanlogger.New(ctx, "ParseQueryRangeResponse") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason is that ctx
is not used below within this function. I believe replacing it is more future proof to avoid mistakes (we should do it once we'll start using ctx
below and it's easy to forget to do it).
While doing this change, would it make sense to move error-marking to This would also make (I thought it already works this way when making my earlier suggestion, but that doesn't seem to be the case) |
Thanks for the review again @pracucci I think everything is updated correctly now. There's just the lint exclusion left, I was not able to get an entry into the file that worked locally, which is why I went with the nolint comments. @pstibrany are you referring to the spanloggers |
error, this is so we consistently have the error=true tag set. Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Yes, I'm referring to |
attempt to avoid use of the incorrect ctx in the future. Signed-off-by: Callum Styan <[email protected]>
Oh interesting, TIL. Good find, yeah we could then check Are you okay if I just open a new issue to look into this, instead of doing it as part of this PR? Having the error tags in general is going to help with debugging the ruler already, which is what prompted me to make this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting, TIL. Good find, yeah we could then check
kvps
for alevel=error
combination and set the tag on the span. The question would be whether we want to assign all kvps as fields on the span as tags or log messages, or just look for a second fielderr
orerror
.
Log
method already adds all kvps as fields to the span, but doesn't mark span with error flag, and doesn't use correct key for error value.
Are you okay if I just open a new issue to look into this, instead of doing it as part of this PR? Having the error tags in general is going to help with debugging the ruler already, which is what prompted me to make this change.
👍
pkg/chunk/cache/memcached.go
Outdated
|
||
// Memcached returns partial results even on error. | ||
if err != nil { | ||
sp.LogFields(otlog.Error(err)) | ||
level.Error(c.logger).Log("msg", "Failed to get keys from memcached", "err", err) | ||
log.Error(err) //nolint:errcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (*github.com/cortexproject/cortex/pkg/util/spanlogger.SpanLogger).Error
should work (it does with standalone errcheck utility). When I update golangci-lint
in the build image to 1.30.0, I no longer see this warning, even without changing .errcheck-exclude. 😕
We could also remove error return value from this method, since no production code uses it.
Signed-off-by: Marco Pracucci <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! @pstibrany do you have any other comment?
It's very confusing to look at a trace UI for two traces that seemed to have failed for the same reason, yet one has error ! icons and the other doesn't. This seems to be a side effect of not setting
error=true
on all error cases, and one case where we haderror="a string"
.There's also a change required in upstream weaveworks/common to add another
error=true
Signed-off-by: Callum Styan [email protected]
cc @gouthamve @gotjosh @joe-elliott
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]