Skip to content

Conversation

@ShahabT
Copy link
Contributor

@ShahabT ShahabT commented Nov 24, 2025

What changed?

DeploymentVersion workflow stops sending sync summary signal to the Deployment workflow after finishing async propagation.

Why?

The SyncVersionState already returns the new summary and this extra signal can create unnecessary race conditions.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

None

@ShahabT ShahabT requested review from a team as code owners November 24, 2025 18:55
@ShahabT ShahabT requested a review from Shivs11 November 24, 2025 18:56
logger := activity.GetLogger(ctx)
// Get all the task queues in the current version and put them into SyncUserData format
currVersionInfo, _, err := a.deploymentClient.DescribeVersion(ctx, a.namespace, input.CurrentVersion, false)
currVersionInfo, _, err := a.WorkerDeploymentClient.DescribeVersion(ctx, a.namespace, input.CurrentVersion, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is changing from deploymentClient to WorkerDeploymentClient a design change only? I looked around and I think we were repeating the declaration of this earlier on but I just wanna be sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just the name that was there in activityDeps. now that we embed that we don't need to have deploymentClient.

scheduledTime := activity.GetInfo(ctx).ScheduledTime

if scheduledTime.Add(SlowPropagationDelay).Before(time.Now()) {
a.Logger.Warn("Slow propagation detected, attempting to sync user data",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using logger, which is defined above, here? Wonder if these two are different fundamentally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logger in SDK adds activity context stuff like wf and activity names which is a good thing, changing to that.

Comment on lines +1284 to +1285
VersioningDataPropagationLatency = NewTimerDef("versioning_data_propagation_latency")
SlowVersioningDataPropagationCounter = NewCounterDef("slow_versioning_data_propagation")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious: do we wanna also have a counter to track the data propagation for those users using sync workflows?

The reason why I say this is because they could make things easier to debug/understand when they have an operator's hat on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree to the above point I made, we need to rename the variables just fyi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we're not gonna support sync wfs for long. once we verify async then we'd make that the default and eventually cleanup async path.

Comment on lines 56 to 59
scheduledTime := activity.GetInfo(ctx).ScheduledTime

if scheduledTime.Add(SlowPropagationDelay).Before(time.Now()) {
a.Logger.Warn("Slow propagation detected, attempting to sync user data",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what you were trying to find here, but I took a minute/two to understand that this was actually finding out how long it took for the already scheduled activity to start.

Not sure if the name of SlowVersioningDataPropagationCounter makes sense if my above understanding is right. Something like ActivitySchedulingDelayCounter could make sense but shall leave the naming up to ya!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this scheduledTime is the first attempt schedule time, not the current attempt scheduling time. so basically if the activity has to retry for long it should fire.

// Not counting the possible wait for previous propagations in this propagation latency.
startTime := workflow.Now(ctx)
defer func() {
d.metrics.Timer(metrics.VersioningDataPropagationLatency.Name()).Record(time.Since(startTime))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is using something like time.Since NDE prone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For metrics purposes yes, but the linter is not happy about it so I need to change it.

d.syncSummary(ctx)
if workflow.GetVersion(ctx, "no-propagation-sync-summary", workflow.DefaultVersion, 0) == workflow.DefaultVersion {
// TODO: clean this unnecessary sync up.
// No summary changes happen in async propagation that the deployment workflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unfinished comment i believe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants