Skip to content

Handle the error in SpanLogger.Log if log level is Error (#3016) #3995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

ChinYing-Li
Copy link
Contributor

@ChinYing-Li ChinYing-Li commented Mar 22, 2021

Signed-off-by: ChinYing-Li [email protected]

What this PR does:
Detect error in SpanLogger.Log and triggers SpanLogger.Error if there's any.
#3016 wants to trigger SpanLogger.Error by detecting the level=Error keyvalue pair added by level.Error(logger). Upon investigation, I didn't find a way to retrieve keyvalue pairs in a logger. This PR, as a workaround, isn't ideal: if user only passes in strings to level.Error(logger).Log, then SpanLogger.Error won't be called.

Not sure whether this falls under CHANGE or ENHANCEMENT in the CHANGELOG.md.
Any suggestion is appreciated!

Which issue(s) this PR fixes:
Fixes #3016

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@ChinYing-Li ChinYing-Li force-pushed the issue_3016 branch 2 times, most recently from 8aab361 to 358183f Compare March 23, 2021 09:38
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I think we should also look for all places where we call SpanLogger.Error() and remove that call if the error is also logged via SpanLogger.Log(), otherwise this change would introduce double calls to SpanLogger.Error().

@@ -94,3 +95,12 @@ func (s *SpanLogger) Error(err error) error {
s.Span.LogFields(otlog.Error(err))
return err
}

// triggerError is used to detect error in kvps, and calls s.Error if there's any.
func (s *SpanLogger) triggerError(kvps ...interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would set the error flag for any error logged, regardless of the log level. I think the idea in #3016 was to detect the log level and set the error flag only if the level was error. We have may have cases where we log an error even if it's not critical and way may not want to "red flag" it in the span, so filtering out any non error level may help.

@pstibrany
Copy link
Contributor

I didn't find a way to retrieve keyvalue pairs in a logger.

But you did! :) Key-value pairs are passed to the Log method, and your triggerError is checking them already (kvps slice is consisting of [key1, value1, key2, value2, ...]). I think the suggestion was to check for special key level.Key() and value level.ErrorValue(), rather than errors, because errors can be logged at debug or warn level too.

@ChinYing-Li
Copy link
Contributor Author

ChinYing-Li commented Mar 26, 2021

... check for special key level.Key() and value level.ErrorValue(), rather than errors, because errors can be logged at debug or warn level too.

^ This is exactly what I meant by "I didn't find a way to retrieve keyvalue pairs in a logger". The keyvalue pair level=Error is added to SpanLogger by level.Error(logger) before SpanLogger.Log is called. The ideal solution to this issue would be detecting level=Error, but there's no way to find out what's already logged in a logger.

@pstibrany
Copy link
Contributor

pstibrany commented Mar 26, 2021

^ This is exactly what I meant by "I didn't find a way to retrieve keyvalue pairs in a logger". The keyvalue pair level=Error is added to SpanLogger by level.Error(logger) before SpanLogger.Log is called. The ideal solution to this issue would be detecting level=Error, but there's no way to find out what's already logged in a logger.

I'm confused... If I add following lines to (*SpanLogger).Log method, I can see log levels just fine:

func (s *SpanLogger) Log(kvps ...interface{}) error {
	for ix := 0; ix+1 < len(kvps); ix += 2 {
		if kvps[ix] == level.Key() {
			if kvps[ix+1] == level.ErrorValue() {
				fmt.Println("Error!", kvps)
			} else if kvps[ix+1] == level.InfoValue() {
				fmt.Println("Info!", kvps)
			} else if kvps[ix+1] == level.DebugValue() {
				fmt.Println("Debug!", kvps)
			}
		}
	}

... // rest of Log method here

The way level.Error(logger) works is by returning new logger, by using log.WithPrefix with level.Key() and level.ErrorValue(): (

func Error(logger log.Logger) log.Logger {
return log.WithPrefix(logger, Key(), ErrorValue())
}
).

It's this new logger's Log method that is really called in expression like level.Error(noSpan).Log("hello", "world"):

func (l *context) Log(keyvals ...interface{}) error {
kvs := append(l.keyvals, keyvals...)
if len(kvs)%2 != 0 {
kvs = append(kvs, ErrMissingValue)
}
if l.hasValuer {
// If no keyvals were appended above then we must copy l.keyvals so
// that future log events will reevaluate the stored Valuers.
if len(keyvals) == 0 {
kvs = append([]interface{}{}, l.keyvals...)

What it does is that it appends supplied parameters to l.keyvals (which contains level.Key() and level.ErrorValue() already), and passes all of that to underlying logger, in our case *SpanLogger. So (*SpanLogger).Log method can see those values just fine. There is no other way how Logger can obtain key-value pairs, other than via Log method... Logger interface doesn't have other methods:

// Logger is the fundamental interface for all log operations. Log creates a
// log event from keyvals, a variadic sequence of alternating keys and values.
// Implementations must be safe for concurrent use by multiple goroutines. In
// particular, any implementation of Logger that appends to keyvals or
// modifies or retains any of its elements must make a copy first.
type Logger interface {
Log(keyvals ...interface{}) error
}

@ChinYing-Li
Copy link
Contributor Author

ChinYing-Li commented Mar 27, 2021

Thanks for working on this! I think we should also look for all places where we call SpanLogger.Error() and remove that call if the error is also logged via SpanLogger.Log(), otherwise this change would introduce double calls to SpanLogger.Error().

I've looked through the code base but didn't find any SpanLogger.Error(). All I see is level.Error() and http.Error() (some require.Error() in the test files). I wonder if I missed something?

Edit: HI @pracucci, just a ping, to see if you have any comments regarding the question above.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked through the code base but didn't find any SpanLogger.Error(). All I see is level.Error() and http.Error() (some require.Error() in the test files). I wonder if I missed something?

Uhm, there are many (eg. if you use Goland you can just use the "Find usage" feature). I checked them for you and places where we both call Error() and Log() with error level are:

  • pkg/chunk/cache/memcached.go line 161
  • pkg/querier/queryrange/results_cache.go line 560
  • pkg/testexporter/correctness/simple.go within the verifySamples() function

Comment on lines 104 to 102
if kvps[i] == level.Key() && kvps[i+1] == level.ErrorValue() {
logAsError = true
} else if kvps[i] == "err" {
if err, ok := kvps[i+1].(error); ok {
loggedError = err
} else if errMsg, ok := kvps[i+1].(string); ok {
loggedError = errors.New(errMsg)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify this, supporting only error even if it's not preceeded by the field name err (eg. may also be called differently, not sure, too many places to check);

Suggested change
if kvps[i] == level.Key() && kvps[i+1] == level.ErrorValue() {
logAsError = true
} else if kvps[i] == "err" {
if err, ok := kvps[i+1].(error); ok {
loggedError = err
} else if errMsg, ok := kvps[i+1].(string); ok {
loggedError = errors.New(errMsg)
}
}
if kvps[i] == level.Key() && kvps[i+1] == level.ErrorValue() {
logAsError = true
continue
}
if err, ok := kvps[i+1].(error); ok {
loggedError = err
}

// triggerError calls s.Error if SpanLogger's log level is set to level.ErrorValue().
func (s *SpanLogger) triggerError(kvps ...interface{}) {
var logAsError = false
loggedError := errors.New("unknown error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to allocate it each time. You can initialize it to nil, and then set it to an "unknown error" inside:

	if logAsError {
		_ = s.Error(loggedError)
	}

var logAsError = false
loggedError := errors.New("unknown error")
for i := 0; i < len(kvps) - 1; i += 2 {
if kvps[i] == level.Key() && kvps[i+1] == level.ErrorValue() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to continue the iteration if the level is not error. I think for the purpose of this triggerError() function you could just return if level != error.

@@ -94,3 +95,22 @@ func (s *SpanLogger) Error(err error) error {
s.Span.LogFields(otlog.Error(err))
return err
}

// triggerError calls s.Error if SpanLogger's log level is set to level.ErrorValue().
func (s *SpanLogger) triggerError(kvps ...interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is not quite correct. Behaviour I would expect to see is:

  1. if log level is error, set span to error (ext.Error.Set(s.Span, true)). That is all that triggerError should do imho.

  2. if there is also error object logged, it should be converted to otlog.Error(err) field. But I would expect to see this happen in Log method, specifically in InterleavedKVToFields mapping, which returns fields from values. Note that presence of error object should not mark span itself as errorneous, as errors an be logged at debug level too.

WDYT?

Copy link
Contributor Author

@ChinYing-Li ChinYing-Li Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pstibrany I see what you mean:
Right now triggerError calls SpanLogger.Error, which calls ext.Error.Set(s.Span, true) and s.Span.LogFields(otlog.Error(err)). err is being logged twice, once in the triggered SpanLogger.Error and once in Log itself.

#3016 wants to save users one line of code (calling SpanLogger.Error themselves) if level.Error is called before SpanLogger.Log - triggerError simply allows user to achieve what the two lines of code below do in one line.

level.Error(log).Log("msg", "wrong number of samples", "expected", expectedNumSamples, "actual", len(pairs))
log.Error(fmt.Errorf("wrong number of samples"))

Please let me know if anything above is unclear.

Copy link
Contributor Author

@ChinYing-Li ChinYing-Li Apr 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pstibrany Just a friendly ping: I wonder if you have any further thoughts or comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now triggerError calls SpanLogger.Error, which calls ext.Error.Set(s.Span, true) and s.Span.LogFields(otlog.Error(err)). err is being logged twice, once in the triggered SpanLogger.Error and once in Log itself.

You are right, that's not good :)

I see also that we cannot easily modify InterleavedKVToFields. What about following approach:

  • Log will first call s.Logger.Log(kvps...) as it does now
  • it would then have logic from triggerError, specifically
    • lookup of level.Key()=level.ErrorValue() pair (this causes ext.Error.Set(s.Span, true) to be called)
    • if we have found level.Key()=level.ErrorValue(), then we would also search for non-nil objects implementing error interface. If found with err or error key, we can remove it from kvps slice, and keep as extra field. (Only first found error is handled this way).
  • Log would then call otlog.InterleavedKVToFields to convert remaining fields, and if we kept extra error field in previous step, we add it to these fields
  • Log then calls s.Span.LogFields with all fields (incl. error one, if any).

We would not reuse (*SpanLogger).Error method, to avoid logging of error twice.

WDYT?

Copy link
Contributor Author

@ChinYing-Li ChinYing-Li Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask the motivation to take out the first errorat step 2, and putting it back at step 3? Do you mean that the first error should be kept as an extra field, and the rest of the error should be removed from the kvps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask the motivation to take out the first errorat step 2, and putting it back at step 3?

Motivation is to avoid double-logging of the error, and to store it wrapped into otlog.Error(err) instead. Calling otlog.InterleavedKVToFields with error instance would convert it to string.

Do you mean that the first error should be kept as an extra field,

Yes.

and the rest of the error should be removed from the kvps?

No, I would keep the rest as-is. If we removed them, they would disappear from the span-log. That would be bad.

@gouthamve
Copy link
Contributor

Can you please rebase against master to pull in #4137 and move the changelog entry to the top?

@ChinYing-Li ChinYing-Li changed the title Call SpanLogger.Error if an error is passed to SpanLogger.Log (#3016) Handle the error in SpanLogger.Log if log level is Error (#3016) Apr 28, 2021
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ChinYing-Li for your patience! What did look like an easy change is turning out not that easy. I think we're on the good track tho!

}
}
if logAsError && errorIndex != -1 {
s.Span.LogFields(otlog.Error(kvps[i+1].(error)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to use errorIndex instead of i, otherwise you're assuming that the level is always before the error in kvps ordering.

@@ -77,6 +77,31 @@ func FromContextWithFallback(ctx context.Context, fallback log.Logger) *SpanLogg
// also puts the on the spans.
func (s *SpanLogger) Log(kvps ...interface{}) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it doable adding some valuable unit tests on the new logic you've added? Otherwise we'll need to manually test it before merging, cause the logic is a bit convoluted and there may be subtle bugs slipping it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I could not see a way to retrieve the field of SpanLogger, it's unlikely to test by comparing the value of the fields. If we refactor the new logic into a private function, I can write unit tests for it. However unit testing a private function doesn't sound like a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pracucci Do you have any further suggestion based on my previous comment? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using https://github.com/opentracing/opentracing-go/blob/master/mocktracer/mocktracer.go ?

(We unit-test private functions all the time, but I agree it's better if you can avoid)

@bboreham
Copy link
Contributor

What is the performance impact of doing these checks on every field of every log?

@ChinYing-Li
Copy link
Contributor Author

Sorry for getting back late!
I just wrote a small test, attached below, that times the Log function.

func TestSpanLogger_Timing(t *testing.T) {
	_, ctx := New(context.Background(), "test", "bar")
	newSpan := FromContext(ctx)
	startTime := time.Now()

	for i := 0; i < 10000; i++ {
		_ = newSpan.Log("metric1", 1, "err", errors.New("err"), "metric2", 2)
	}

	totalTime := time.Since(startTime)
	print(totalTime, "\n ")
}

Certainly there's an impact on the run time for doing the O(n) check on logged values.
Please let me know in case you have any advice on testing/profiling methodologies! @bboreham

Result on master

/usr/local/go/bin/go tool test2json -t /tmp/___TestSpanLogger_Timing_in_github_com_cortexproject_cortex_pkg_util_spanlogger.test -test.v -test.paniconexit0 -test.run ^\QTestSpanLogger_Timing\E$
=== RUN   TestSpanLogger_Timing
11043275
 --- PASS: TestSpanLogger_Timing (0.01s)
PASS

Process finished with the exit code 0

Result on issue_3016

/usr/local/go/bin/go tool test2json -t /tmp/___TestSpanLogger_Timing_in_github_com_cortexproject_cortex_pkg_util_spanlogger.test -test.v -test.paniconexit0 -test.run ^\QTestSpanLogger_Timing\E$
=== RUN   TestSpanLogger_Timing
11784684
 --- PASS: TestSpanLogger_Timing (0.01s)
PASS

Process finished with the exit code 0

@bboreham
Copy link
Contributor

bboreham commented Aug 2, 2021

Could you re-do that using the Go benchmark framework?
That way it will run the loop as many times as it takes to get a reasonable measurement (0.01s is too short to draw conclusions about)
Don't call errors.New("err") as part of the timed section, because that distorts the result.

You had another question about how to do testing - in pkg/util/spanlogger/spanlogger_test.go it uses a mock tracer to check what the code does; would that help?

@ChinYing-Li
Copy link
Contributor Author

ChinYing-Li commented Aug 2, 2021

Thanks for the pointer. I have yet to incorporate mocktracer into the benchmarking code, and here's the benchmark functions (Fixed the bug and updated the benchmarking results on 3 Aug):

func BenchmarkSpanLogger100(b *testing.B) {
	benchmarkSpanLogger(b, 100)
}

func BenchmarkSpanLogger1000(b *testing.B) {
	benchmarkSpanLogger(b, 1000)
}

func benchmarkSpanLogger(b *testing.B, numPair int) {
	_, ctx := New(context.Background(), "test", "bar")
	newSpan := FromContext(ctx)
	err := "err"
	arr := make([]interface{}, numPair*2)
	for i := 0; i < numPair; i++ {
		arr[2*i] = time.Now().String()
		arr[2*i + 1] = err
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		newSpan.Log(arr...)
	}
}

No actual Error objects are passed as value, as we only want to check the latency caused by "checking for Error".

Results on branch issue_3016

/tmp/___BenchmarkSpanLogger100_in_github_com_cortexproject_cortex_pkg_util_spanlogger.test -test.v -test.paniconexit0 -test.bench ^\QBenchmarkSpanLogger100\E$ -test.run ^$
goos: linux
goarch: amd64
pkg: github.com/cortexproject/cortex/pkg/util/spanlogger
cpu: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
BenchmarkSpanLogger100
BenchmarkSpanLogger100-12    	  342025	      3030 ns/op
PASS

Process finished with the exit code 0
/tmp/___BenchmarkSpanLogger1000_in_github_com_cortexproject_cortex_pkg_util_spanlogger.test -test.v -test.paniconexit0 -test.bench ^\QBenchmarkSpanLogger1000\E$ -test.run ^$
goos: linux
goarch: amd64
pkg: github.com/cortexproject/cortex/pkg/util/spanlogger
cpu: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
BenchmarkSpanLogger1000
BenchmarkSpanLogger1000-12    	   39116	     28074 ns/op
PASS

Process finished with the exit code 0

Results on branch master

/tmp/___BenchmarkSpanLogger100_in_github_com_cortexproject_cortex_pkg_util_spanlogger.test -test.v -test.paniconexit0 -test.bench ^\QBenchmarkSpanLogger100\E$ -test.run ^$
goos: linux
goarch: amd64
pkg: github.com/cortexproject/cortex/pkg/util/spanlogger
cpu: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
BenchmarkSpanLogger100
BenchmarkSpanLogger100-12    	  614250	      1779 ns/op
PASS

Process finished with the exit code 0
/tmp/___BenchmarkSpanLogger1000_in_github_com_cortexproject_cortex_pkg_util_spanlogger.test -test.v -test.paniconexit0 -test.bench ^\QBenchmarkSpanLogger1000\E$ -test.run ^$
goos: linux
goarch: amd64
pkg: github.com/cortexproject/cortex/pkg/util/spanlogger
cpu: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
BenchmarkSpanLogger1000
BenchmarkSpanLogger1000-12    	   71996	     16386 ns/op
PASS

Process finished with the exit code 0

@bboreham
Copy link
Contributor

bboreham commented Aug 3, 2021

190.7 ns/op
192.3 ns/op

This should make you suspicious: 2ns is enough to execute maybe 10 instructions, yet you expect it to process 1000 values.

  newSpan.Log(arr)

This passes a single value, which fails immediately because Log is expecting pairs of values.
I suspect you meant:

		newSpan.Log(arr...)

(and you have to make arr of type []interface{})

@ChinYing-Li
Copy link
Contributor Author

I was also a bit confused about the time/op but not sure what went wrong. Thank you for the pointers, and I've updated the code and the results in my previous comment!

@stale
Copy link

stale bot commented Nov 16, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 16, 2021
@stale stale bot closed this Dec 2, 2021
@ChinYing-Li
Copy link
Contributor Author

Hi there,
Sorry for coming back late; was tied down by the new grad recruiting season. I refactored the logic into a private function and wrote several test cases for it. I wonder if this PR can be re-opened? Thank you! @bboreham

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spanlogger Log should call Error if log level is error
5 participants