-
Notifications
You must be signed in to change notification settings - Fork 1.2k
No need send sync summary signal after async propagation #8689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| logger := activity.GetLogger(ctx) | ||
| // Get all the task queues in the current version and put them into SyncUserData format | ||
| currVersionInfo, _, err := a.deploymentClient.DescribeVersion(ctx, a.namespace, input.CurrentVersion, false) | ||
| currVersionInfo, _, err := a.WorkerDeploymentClient.DescribeVersion(ctx, a.namespace, input.CurrentVersion, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is changing from deploymentClient to WorkerDeploymentClient a design change only? I looked around and I think we were repeating the declaration of this earlier on but I just wanna be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just the name that was there in activityDeps. now that we embed that we don't need to have deploymentClient.
| scheduledTime := activity.GetInfo(ctx).ScheduledTime | ||
|
|
||
| if scheduledTime.Add(SlowPropagationDelay).Before(time.Now()) { | ||
| a.Logger.Warn("Slow propagation detected, attempting to sync user data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be using logger, which is defined above, here? Wonder if these two are different fundamentally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logger in SDK adds activity context stuff like wf and activity names which is a good thing, changing to that.
| VersioningDataPropagationLatency = NewTimerDef("versioning_data_propagation_latency") | ||
| SlowVersioningDataPropagationCounter = NewCounterDef("slow_versioning_data_propagation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious: do we wanna also have a counter to track the data propagation for those users using sync workflows?
The reason why I say this is because they could make things easier to debug/understand when they have an operator's hat on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you agree to the above point I made, we need to rename the variables just fyi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we're not gonna support sync wfs for long. once we verify async then we'd make that the default and eventually cleanup async path.
| scheduledTime := activity.GetInfo(ctx).ScheduledTime | ||
|
|
||
| if scheduledTime.Add(SlowPropagationDelay).Before(time.Now()) { | ||
| a.Logger.Warn("Slow propagation detected, attempting to sync user data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what you were trying to find here, but I took a minute/two to understand that this was actually finding out how long it took for the already scheduled activity to start.
Not sure if the name of SlowVersioningDataPropagationCounter makes sense if my above understanding is right. Something like ActivitySchedulingDelayCounter could make sense but shall leave the naming up to ya!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this scheduledTime is the first attempt schedule time, not the current attempt scheduling time. so basically if the activity has to retry for long it should fire.
| // Not counting the possible wait for previous propagations in this propagation latency. | ||
| startTime := workflow.Now(ctx) | ||
| defer func() { | ||
| d.metrics.Timer(metrics.VersioningDataPropagationLatency.Name()).Record(time.Since(startTime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is using something like time.Since NDE prone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For metrics purposes yes, but the linter is not happy about it so I need to change it.
| d.syncSummary(ctx) | ||
| if workflow.GetVersion(ctx, "no-propagation-sync-summary", workflow.DefaultVersion, 0) == workflow.DefaultVersion { | ||
| // TODO: clean this unnecessary sync up. | ||
| // No summary changes happen in async propagation that the deployment workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unfinished comment i believe
What changed?
DeploymentVersion workflow stops sending sync summary signal to the Deployment workflow after finishing async propagation.
Why?
The SyncVersionState already returns the new summary and this extra signal can create unnecessary race conditions.
How did you test it?
Potential risks
None