-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Batch Metrics Handler for Mutable State Metrics #6603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| eventNotifier events.Notifier | ||
| tokenSerializer common.TaskTokenSerializer | ||
| metricsHandler metrics.Handler | ||
| batchMetricsHandler metrics.BatchMetricsHandler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already pretty annoying to pass around a bunch of these "context objects" like metrics.Handler and log.Logger (sometimes two loggers), so I hate to add more. Is there some way we could do this in the existing metrics.Handler? I guess we don't want to break existing implementations, so maybe the interface extension pattern? (though I know that has problems)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick look @dnr!
Sure, I agree plumbing this around the place is a pain. As you said, I don't want to break the existing interface so extending could work. Alternatively we could wrap the various observability handlers within some unified wrapper so we just have the one thing to pass around and we keep the batch handler distinct from the standard handler.
For the former I'd have BatchMetricsHandler extend metricsHandler and remove the basic metricsHandler from the context objects that were touched in this PR. Over time if and when other batch metrics are added in the codebase the batch handler can replace the old handler.
For the latter we'd have some kind of ObservabilityHandler containing the 3 types which replaces the Log / Metrics / BatchMetrics handlers in the relevant contexts. This would ultimately mean less things to pass around, the downside being the coupling of Logging with Metrics (though in practice, I'm not sure how much of an issue this would be).
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I discussed the approach with @njo originally, we had a different approach. metrics.Handler had a StartBatch() method that returns a BatchHandler, the BatchHandler is just a metrics.Handler and an io.Closer.
With that approach, you'd have:
batch := metricsHandler.StartBatch()
defer batch.Close()
// Record metrics on the batch
metrics.MutableStateSize.With(batch).Record(int64(stats.TotalSize))
metrics.ExecutionInfoSize.With(batch).Record(int64(stats.ExecutionInfoSize))
metrics.ExecutionStateSize.With(batch).Record(int64(stats.ExecutionStateSize))
The default implementation for a metrics handler would return itself with a nop closer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable @bergundy. I misunderstood when we talked that we'd be using the same metrics handler interface. To clarify, I'd be extending the existing interface here rather than creating a new Handler type that wraps the old one? So going through and updating our existing handler implementations and leaving folks with a custom handler to add the new batch method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have broken source-compatibility for users who use the server as a library before, but we try pretty hard not to. I feel like this case might be worth it, though, since it's very easy to fix. We should get a team consensus on that.
I like the ObservabilityHandler idea! We might need/want to do some unification of tags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're worried about breaking the metrics interface (which I'm not especially worried about) we can make the addition method optional and try to cast to that interface at runtime.
I'll bring this up for discussion with the team.
Also like the idea of an ObservabilityHandler that encapsulates metrics and logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a big fan of the ObservabilityHandler approach, though at a previous job we combined it with per-request contexts so you'd get something like:
type OperationContext struct {
context.Context
logger
metrics
batchMetrics
// etc
}
var _ context.Context = (*OperationContext)(nil)
so we could combine the ctx.Context into this and drop yet another parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the logger you get from fx (static scope) would be tagged differently from the logger you get from a context (dynamic scope). They're good for two different purposes, ideally you might want to combine tags from both. We don't really do dynamically scoped loggers, we just add a few tags at the call site for request-level info.
|
Closed in favour of #6655 |
TODOS:
What changed?
Added batch metrics which can be used to emit "wide events" if a suitable backend is available (otherwise fall back to sending individual events for each field). Modified the mutable state metrics to emit a wide event.
Why?
Wide events can reduce the total number of network events (mutable state metrics can now send 1 event rather than 22). The metrics / events can also be correlated and provide more context.
How did you test it?
Ran unit tests.
Potential risks
I'd expect the server wouldn't start at all or throw a NPE when a mutable state event is to be emitted in the worst case.
Documentation
Is hotfix candidate?
No