Skip to content

Conversation

@bergundy
Copy link
Member

What changed?

See title, also added a frontend.reachabilityQuerySetDurationSinceDefault dynamic config with a default of 5 minutes.

Why?

Docstring has an explanation:

	// ReachabilityQuerySetDurationSinceDefault is the minimum period since a version set was demoted from being the
	// queue default before it is considered unreachable by new workflows.
	// This setting allows some propogation delay of versioning data for the reachability queries, which may happen for
	// the following reasons:
	// 1. There are no workflows currently marked as open in the visibility store but a worker for the demoted version
	// is currently processing a task.
	// 2. There are delays in the asynchrnous visiblity task processor.
	// 3. There's propagation delay of the versioning data between matching nodes.

How did you test it?

Added a couple of tests

@bergundy bergundy requested a review from dnr June 26, 2023 21:15
@bergundy bergundy requested a review from a team as a code owner June 26, 2023 21:15
// ReachabilityQueryBuildIdLimit limits the number of build ids that can be requested in a single call to the
// GetWorkerTaskReachability API.
ReachabilityQueryBuildIdLimit = "limit.reachabilityQueryBuildIds"
// ReachabilityQuerySetDurationSinceDefault is the minimum period since a version set was demoted from being the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two configs represent the same concept, right? why not collapse them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing a build id is more severe, I think we want different configs for them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that's fair

}

// ReduceSliceInitial reduces a slice using given reducer function and initial value.
func ReduceSliceInitial[T any, A any](in []T, initializer A, reducer func(A, T) A) A {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func ReduceSliceInitial[T any, A any](in []T, initializer A, reducer func(A, T) A) A {
func ReduceSlice[T any, A any](in []T, initializer A, reducer func(A, T) A) A {

Reduce always needs an initial value, no need to include it in the name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also reduce from T[] to T that can use the first element as the initializer. I didn't add it to our "godash" lib though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather force callers to do reduce(slice[1:], slice[0], f) so then it's clearly the caller's responsibility to do something for the empty slice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

Namespace: ns.Name().String(),
TaskQueue: taskQueue,
},
value, err := f.matchingClient.GetTaskQueueUserData(ctx, &matchingservice.GetTaskQueueUserDataRequest{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it makes sense to use the more general GetTaskQueueUserData in frontend's GetWorkerBuildIdCompatibility also and remove that specific rpc from matching? it's not really doing anything besides ToBuildIdOrderingResponse, which frontend could call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense to me.

TBH, I would just send all of the user facing user data info in DescribeTaskQueue and not have a special API for GetWorkerBuildIdCompatibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I wouldn't change in this PR.
We'd need to leave the matching RPC for this minor version because 1.21 was already released.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean not have GetWorkerBuildIdCompatibility in workflowservice (frontend)? I like that but we released it already, can we remove it? for matchingservice we need a separate GetTaskQueueUserData for long polls anyway so might as well use that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We marked this whole feature as experimental (API-wise), we can still change but we'll likely need to deprecate APIs and keep them around for a while since users will start relying on them fairly soon.

},
value, err := f.matchingClient.GetTaskQueueUserData(ctx, &matchingservice.GetTaskQueueUserDataRequest{
NamespaceId: ns.ID().String(),
TaskQueue: taskQueue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this always goes to the root (it did before too). probably okay but we could consider load-balancing this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more concerned with hammering visibility than getting this info from matching but I do see your point.
Do you think it's worth it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah, you're right this should be cheap compared to visibility


// Finds the version in the version sets, returning (set index, index within that set)
// Returns -1, -1 if not found.
func findVersion(data *persistencespb.VersioningData, buildID string) (setIndex, indexInSet int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

// ReachabilityQueryBuildIdLimit limits the number of build ids that can be requested in a single call to the
// GetWorkerTaskReachability API.
ReachabilityQueryBuildIdLimit = "limit.reachabilityQueryBuildIds"
// ReachabilityQuerySetDurationSinceDefault is the minimum period since a version set was demoted from being the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that's fair

}

// ReduceSliceInitial reduces a slice using given reducer function and initial value.
func ReduceSliceInitial[T any, A any](in []T, initializer A, reducer func(A, T) A) A {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather force callers to do reduce(slice[1:], slice[0], f) so then it's clearly the caller's responsibility to do something for the empty slice

Namespace: ns.Name().String(),
TaskQueue: taskQueue,
},
value, err := f.matchingClient.GetTaskQueueUserData(ctx, &matchingservice.GetTaskQueueUserDataRequest{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean not have GetWorkerBuildIdCompatibility in workflowservice (frontend)? I like that but we released it already, can we remove it? for matchingservice we need a separate GetTaskQueueUserData for long polls anyway so might as well use that

},
value, err := f.matchingClient.GetTaskQueueUserData(ctx, &matchingservice.GetTaskQueueUserDataRequest{
NamespaceId: ns.ID().String(),
TaskQueue: taskQueue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah, you're right this should be cheap compared to visibility

// 1. There are no workflows currently marked as open in the visibility store but a worker for the demoted version
// is currently processing a task.
// 2. There are delays in the asynchrnous visiblity task processor.
// 2. There are delays in the visibility task processor (which is asynchronous).
Copy link
Contributor

@dnr dnr Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, can you make the same change on the copied line (214)?

@bergundy bergundy force-pushed the reachability-demotion-period branch from 31a4aaa to 4a826ce Compare June 29, 2023 04:04
@bergundy bergundy enabled auto-merge (squash) June 29, 2023 04:05
@bergundy bergundy merged commit c0fb121 into temporalio:master Jun 29, 2023
mindaugasrukas pushed a commit that referenced this pull request Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants