Skip to content

Be consistent with span logging, set error=true and use otlog.Error #2970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 12, 2020
Merged

Be consistent with span logging, set error=true and use otlog.Error #2970

merged 6 commits into from
Aug 12, 2020

Conversation

cstyan
Copy link
Contributor

@cstyan cstyan commented Jul 31, 2020

It's very confusing to look at a trace UI for two traces that seemed to have failed for the same reason, yet one has error ! icons and the other doesn't. This seems to be a side effect of not setting error=true on all error cases, and one case where we had error="a string".

There's also a change required in upstream weaveworks/common to add another error=true

Signed-off-by: Callum Styan [email protected]

cc @gouthamve @gotjosh @joe-elliott

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pstibrany
Copy link
Contributor

Isn’t this better solved by consistently using spanlogger that we use elsewhere?

@cstyan
Copy link
Contributor Author

cstyan commented Jul 31, 2020

Good point, looks like spanlogger Error via opentracings ext.Error.Set(s.Span, true) should solve most of this for us.

@pull-request-size pull-request-size bot added size/M and removed size/S labels Jul 31, 2020
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cstyan Thanks for fixing this. I left few comments.

Could you fix the linter, please? For the error Error return value of log.Error is not checked (errcheck) you could just add it to the ignore list defined in .errcheck-exclude (we don't want to check errors from logger)


sp, _ := opentracing.StartSpanFromContext(ctx, "ParseQueryRangeResponse")
defer sp.Finish()
log, _ := spanlogger.New(ctx, "ParseQueryRangeResponse")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New span here. Should replace the context, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm basing this off how we used the ot library sp, _ := opentracing.StartSpanFromContext(ctx, "ParseQueryRangeResponse"), where we ignored the returned context. I don't know the historical reason behind doing it that way, so if you think it should change here with the introduction of spanlogger that's fine by me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is that ctx is not used below within this function. I believe replacing it is more future proof to avoid mistakes (we should do it once we'll start using ctx below and it's easy to forget to do it).

@@ -171,14 +170,14 @@ func verifySamples(log *spanlogger.SpanLogger, tc Case, pairs []model.SamplePair
expectedNumSamples := int(duration / cfg.ScrapeInterval)
if !epsilonCorrect(float64(len(pairs)), float64(expectedNumSamples), cfg.samplesEpsilon) {
level.Error(log).Log("msg", "wrong number of samples", "expected", expectedNumSamples, "actual", len(pairs))
log.LogFields(otlog.Error(fmt.Errorf("wrong number of samples")))
log.Error(fmt.Errorf("wrong value"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the previous message "wrong number of samples" more correct? It's also what we log.

return false
}
} else {
expectedNumSamples := int(duration / cfg.ScrapeInterval)
if math.Abs(float64(expectedNumSamples-len(pairs))) > 2 {
level.Error(log).Log("msg", "wrong number of samples", "expected", expectedNumSamples, "actual", len(pairs))
log.LogFields(otlog.Error(fmt.Errorf("wrong number of samples")))
log.Error(fmt.Errorf("wrong value"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here

@cstyan
Copy link
Contributor Author

cstyan commented Aug 5, 2020

Thanks for the review @pracucci, I've replied to the comments that are still open, let me know what you think.

@cstyan
Copy link
Contributor Author

cstyan commented Aug 10, 2020

@pracucci can you let me know your preferences re: my replies to your comments?

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cstyan for patiently addressing my feedback 🙏 Your comment about logging is right. I left few minor comments and then we should be good to go.


// Memcached returns partial results even on error.
if err != nil {
sp.LogFields(otlog.Error(err))
level.Error(c.logger).Log("msg", "Failed to get keys from memcached", "err", err)
log.Error(err) //nolint:errcheck
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we shouldn't repeat this nolint comment wherever we use log.Error() but add it to .errcheck-exclude. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't get that to work properly, for various variations of (import path.Struct).Function. If you have a suggestion for what to use as the entry in that file let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the latest commit as well, if we want to have the ctx assignment in the bigtable index client for example, we need nolint comments again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (*github.com/cortexproject/cortex/pkg/util/spanlogger.SpanLogger).Error should work (it does with standalone errcheck utility). When I update golangci-lint in the build image to 1.30.0, I no longer see this warning, even without changing .errcheck-exclude. 😕

We could also remove error return value from this method, since no production code uses it.


sp, _ := opentracing.StartSpanFromContext(ctx, "ParseQueryRangeResponse")
defer sp.Finish()
log, _ := spanlogger.New(ctx, "ParseQueryRangeResponse")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is that ctx is not used below within this function. I believe replacing it is more future proof to avoid mistakes (we should do it once we'll start using ctx below and it's easy to forget to do it).

@pstibrany
Copy link
Contributor

While doing this change, would it make sense to move error-marking to Log method? If passed level is error, we would mark span with error flag, and if there is any error object passed, we would wrap first one into otlog.Error.

This would also make (*SpanLogger).Error(error) method unnecessary.

(I thought it already works this way when making my earlier suggestion, but that doesn't seem to be the case)

@cstyan
Copy link
Contributor Author

cstyan commented Aug 10, 2020

Thanks for the review again @pracucci I think everything is updated correctly now. There's just the lint exclusion left, I was not able to get an entry into the file that worked locally, which is why I went with the nolint comments.

@pstibrany are you referring to the spanloggers Log function? That's just a wrapper for go-kit/log's Log, which has no notion of log level since that's done via the go-kit/log/levels wrappers. So there isn't actually any Log(msg, level) that could take error and log to error level and add the correct error tag to the span. Think we should just stick with compatibility with the logger we use.

cstyan added 4 commits August 10, 2020 11:48
error, this is so we consistently have the error=true tag set.

Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
@pstibrany
Copy link
Contributor

@pstibrany are you referring to the spanloggers Log function? That's just a wrapper for go-kit/log's Log, which has no notion of log level since that's done via the go-kit/log/levels wrappers. So there isn't actually any Log(msg, level) that could take error and log to error level and add the correct error tag to the span. Think we should just stick with compatibility with the logger we use.

Yes, I'm referring to Log function. Log level is passed as just another key-value pair to that method (key being level.Key() and value being level.ErrorValue() for errors), and Log method is already processing all key-value pairs inside, so it can as well understand this special combination. I think it's worth exploring.

attempt to avoid use of the incorrect ctx in the future.

Signed-off-by: Callum Styan <[email protected]>
@cstyan
Copy link
Contributor Author

cstyan commented Aug 10, 2020

Yes, I'm referring to Log function. Log level is passed as just another key-value pair to that method (key being level.Key() and value being level.ErrorValue() for errors), and Log method is already processing all key-value pairs inside, so it can as well understand this special combination. I think it's worth exploring.

Oh interesting, TIL. Good find, yeah we could then check kvps for a level=error combination and set the tag on the span. The question would be whether we want to assign all kvps as fields on the span as tags or log messages, or just look for a second field err or error.

Are you okay if I just open a new issue to look into this, instead of doing it as part of this PR? Having the error tags in general is going to help with debugging the ruler already, which is what prompted me to make this change.

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, TIL. Good find, yeah we could then check kvps for a level=error combination and set the tag on the span. The question would be whether we want to assign all kvps as fields on the span as tags or log messages, or just look for a second field err or error.

Log method already adds all kvps as fields to the span, but doesn't mark span with error flag, and doesn't use correct key for error value.

Are you okay if I just open a new issue to look into this, instead of doing it as part of this PR? Having the error tags in general is going to help with debugging the ruler already, which is what prompted me to make this change.

👍


// Memcached returns partial results even on error.
if err != nil {
sp.LogFields(otlog.Error(err))
level.Error(c.logger).Log("msg", "Failed to get keys from memcached", "err", err)
log.Error(err) //nolint:errcheck
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (*github.com/cortexproject/cortex/pkg/util/spanlogger.SpanLogger).Error should work (it does with standalone errcheck utility). When I update golangci-lint in the build image to 1.30.0, I no longer see this warning, even without changing .errcheck-exclude. 😕

We could also remove error return value from this method, since no production code uses it.

@pracucci
Copy link
Contributor

@cstyan I've pushed a commit to try to globally exclude SpanLogger.Error() from errcheck (it works on my local machine, let's see what CI says...).

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @pstibrany do you have any other comment?

@pracucci pracucci merged commit 03165e8 into cortexproject:master Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants