Batch Metrics Handler for Mutable State Metrics #6603

njo · 2024-10-04T18:22:20Z

TODOS:

add unit test ensuring mutable state metrics are emitted with the fallback metrics handler
document change
add tags to a cloned batch handler instance like we do in the metrics handler through the codebase

What changed?

Added batch metrics which can be used to emit "wide events" if a suitable backend is available (otherwise fall back to sending individual events for each field). Modified the mutable state metrics to emit a wide event.

Why?

Wide events can reduce the total number of network events (mutable state metrics can now send 1 event rather than 22). The metrics / events can also be correlated and provide more context.

How did you test it?

Ran unit tests.

Potential risks

I'd expect the server wouldn't start at all or throw a NPE when a mutable state event is to be emitted in the worst case.

Documentation

Is hotfix candidate?

No

CLAassistant · 2024-10-04T18:22:25Z

All committers have signed the CLA.

dnr · 2024-10-04T18:41:03Z

service/history/history_engine.go

 		eventNotifier              events.Notifier
 		tokenSerializer            common.TaskTokenSerializer
 		metricsHandler             metrics.Handler
+		batchMetricsHandler        metrics.BatchMetricsHandler


It's already pretty annoying to pass around a bunch of these "context objects" like metrics.Handler and log.Logger (sometimes two loggers), so I hate to add more. Is there some way we could do this in the existing metrics.Handler? I guess we don't want to break existing implementations, so maybe the interface extension pattern? (though I know that has problems)

Thanks for the quick look @dnr!

Sure, I agree plumbing this around the place is a pain. As you said, I don't want to break the existing interface so extending could work. Alternatively we could wrap the various observability handlers within some unified wrapper so we just have the one thing to pass around and we keep the batch handler distinct from the standard handler.

For the former I'd have BatchMetricsHandler extend metricsHandler and remove the basic metricsHandler from the context objects that were touched in this PR. Over time if and when other batch metrics are added in the codebase the batch handler can replace the old handler.

For the latter we'd have some kind of ObservabilityHandler containing the 3 types which replaces the Log / Metrics / BatchMetrics handlers in the relevant contexts. This would ultimately mean less things to pass around, the downside being the coupling of Logging with Metrics (though in practice, I'm not sure how much of an issue this would be).

What do you think?

When I discussed the approach with @njo originally, we had a different approach. metrics.Handler had a StartBatch() method that returns a BatchHandler, the BatchHandler is just a metrics.Handler and an io.Closer.

With that approach, you'd have:

batch := metricsHandler.StartBatch() defer batch.Close() // Record metrics on the batch metrics.MutableStateSize.With(batch).Record(int64(stats.TotalSize)) metrics.ExecutionInfoSize.With(batch).Record(int64(stats.ExecutionInfoSize)) metrics.ExecutionStateSize.With(batch).Record(int64(stats.ExecutionStateSize))

The default implementation for a metrics handler would return itself with a nop closer.

That sounds reasonable @bergundy. I misunderstood when we talked that we'd be using the same metrics handler interface. To clarify, I'd be extending the existing interface here rather than creating a new Handler type that wraps the old one? So going through and updating our existing handler implementations and leaving folks with a custom handler to add the new batch method?

We have broken source-compatibility for users who use the server as a library before, but we try pretty hard not to. I feel like this case might be worth it, though, since it's very easy to fix. We should get a team consensus on that.

I like the ObservabilityHandler idea! We might need/want to do some unification of tags

If we're worried about breaking the metrics interface (which I'm not especially worried about) we can make the addition method optional and try to cast to that interface at runtime.

I'll bring this up for discussion with the team.

Also like the idea of an ObservabilityHandler that encapsulates metrics and logging.

I'm a big fan of the ObservabilityHandler approach, though at a previous job we combined it with per-request contexts so you'd get something like:

type OperationContext struct { context.Context logger metrics batchMetrics // etc } var _ context.Context = (*OperationContext)(nil)

so we could combine the ctx.Context into this and drop yet another parameter

But the logger you get from fx (static scope) would be tagged differently from the logger you get from a context (dynamic scope). They're good for two different purposes, ideally you might want to combine tags from both. We don't really do dynamically scoped loggers, we just add a few tags at the call site for request-level info.

njo · 2024-10-11T22:38:46Z

Closed in favour of #6655

njo added 2 commits October 3, 2024 19:31

basic batch metrics structure

2c412d9

Fix implementation, plumb through server

5f9d1f1

njo requested a review from a team as a code owner October 4, 2024 18:22

dnr reviewed Oct 4, 2024

View reviewed changes

njo changed the title ~~Mutable spans~~ Batch Metrics Handler for Mutable Span Metrics Oct 4, 2024

njo changed the title ~~Batch Metrics Handler for Mutable Span Metrics~~ Batch Metrics Handler for Mutable State Metrics Oct 4, 2024

njo closed this Oct 11, 2024

yiminc deleted the mutable_spans branch November 4, 2024 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch Metrics Handler for Mutable State Metrics #6603

Batch Metrics Handler for Mutable State Metrics #6603

Uh oh!

njo commented Oct 4, 2024

Uh oh!

CLAassistant commented Oct 4, 2024 •

edited

Loading

Uh oh!

dnr Oct 4, 2024

Uh oh!

njo Oct 4, 2024

Uh oh!

bergundy Oct 4, 2024

Uh oh!

njo Oct 4, 2024

Uh oh!

dnr Oct 4, 2024

Uh oh!

bergundy Oct 7, 2024

Uh oh!

tdeebswihart Oct 7, 2024

Uh oh!

dnr Oct 7, 2024

Uh oh!

njo commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Batch Metrics Handler for Mutable State Metrics #6603

Batch Metrics Handler for Mutable State Metrics #6603

Uh oh!

Conversation

njo commented Oct 4, 2024

What changed?

Why?

How did you test it?

Potential risks

Documentation

Is hotfix candidate?

Uh oh!

CLAassistant commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njo commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

CLAassistant commented Oct 4, 2024 •

edited

Loading