Skip to content

Conversation

@njo
Copy link
Contributor

@njo njo commented Oct 4, 2024

TODOS:

  • add unit test ensuring mutable state metrics are emitted with the fallback metrics handler
  • document change
  • add tags to a cloned batch handler instance like we do in the metrics handler through the codebase

What changed?

Added batch metrics which can be used to emit "wide events" if a suitable backend is available (otherwise fall back to sending individual events for each field). Modified the mutable state metrics to emit a wide event.

Why?

Wide events can reduce the total number of network events (mutable state metrics can now send 1 event rather than 22). The metrics / events can also be correlated and provide more context.

How did you test it?

Ran unit tests.

Potential risks

I'd expect the server wouldn't start at all or throw a NPE when a mutable state event is to be emitted in the worst case.

Documentation

Is hotfix candidate?

No

@njo njo requested a review from a team as a code owner October 4, 2024 18:22
@CLAassistant
Copy link

CLAassistant commented Oct 4, 2024

CLA assistant check
All committers have signed the CLA.

eventNotifier events.Notifier
tokenSerializer common.TaskTokenSerializer
metricsHandler metrics.Handler
batchMetricsHandler metrics.BatchMetricsHandler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already pretty annoying to pass around a bunch of these "context objects" like metrics.Handler and log.Logger (sometimes two loggers), so I hate to add more. Is there some way we could do this in the existing metrics.Handler? I guess we don't want to break existing implementations, so maybe the interface extension pattern? (though I know that has problems)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick look @dnr!

Sure, I agree plumbing this around the place is a pain. As you said, I don't want to break the existing interface so extending could work. Alternatively we could wrap the various observability handlers within some unified wrapper so we just have the one thing to pass around and we keep the batch handler distinct from the standard handler.

For the former I'd have BatchMetricsHandler extend metricsHandler and remove the basic metricsHandler from the context objects that were touched in this PR. Over time if and when other batch metrics are added in the codebase the batch handler can replace the old handler.

For the latter we'd have some kind of ObservabilityHandler containing the 3 types which replaces the Log / Metrics / BatchMetrics handlers in the relevant contexts. This would ultimately mean less things to pass around, the downside being the coupling of Logging with Metrics (though in practice, I'm not sure how much of an issue this would be).

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I discussed the approach with @njo originally, we had a different approach. metrics.Handler had a StartBatch() method that returns a BatchHandler, the BatchHandler is just a metrics.Handler and an io.Closer.

With that approach, you'd have:

batch := metricsHandler.StartBatch()
defer batch.Close()

// Record metrics on the batch
metrics.MutableStateSize.With(batch).Record(int64(stats.TotalSize))
metrics.ExecutionInfoSize.With(batch).Record(int64(stats.ExecutionInfoSize))
metrics.ExecutionStateSize.With(batch).Record(int64(stats.ExecutionStateSize))

The default implementation for a metrics handler would return itself with a nop closer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable @bergundy. I misunderstood when we talked that we'd be using the same metrics handler interface. To clarify, I'd be extending the existing interface here rather than creating a new Handler type that wraps the old one? So going through and updating our existing handler implementations and leaving folks with a custom handler to add the new batch method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have broken source-compatibility for users who use the server as a library before, but we try pretty hard not to. I feel like this case might be worth it, though, since it's very easy to fix. We should get a team consensus on that.

I like the ObservabilityHandler idea! We might need/want to do some unification of tags

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're worried about breaking the metrics interface (which I'm not especially worried about) we can make the addition method optional and try to cast to that interface at runtime.

I'll bring this up for discussion with the team.

Also like the idea of an ObservabilityHandler that encapsulates metrics and logging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of the ObservabilityHandler approach, though at a previous job we combined it with per-request contexts so you'd get something like:

type OperationContext struct {
    context.Context
    logger
    metrics
    batchMetrics
    // etc
}

var _ context.Context = (*OperationContext)(nil)

so we could combine the ctx.Context into this and drop yet another parameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the logger you get from fx (static scope) would be tagged differently from the logger you get from a context (dynamic scope). They're good for two different purposes, ideally you might want to combine tags from both. We don't really do dynamically scoped loggers, we just add a few tags at the call site for request-level info.

@njo njo changed the title Mutable spans Batch Metrics Handler for Mutable Span Metrics Oct 4, 2024
@njo njo changed the title Batch Metrics Handler for Mutable Span Metrics Batch Metrics Handler for Mutable State Metrics Oct 4, 2024
@njo njo closed this Oct 11, 2024
@njo
Copy link
Contributor Author

njo commented Oct 11, 2024

Closed in favour of #6655

@yiminc yiminc deleted the mutable_spans branch November 4, 2024 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants