clear drainage status of a rollbacked version workflow #8119

Shivs11 · 2025-07-30T16:37:10Z

What changed?

When a deactivated (drained/draining) version gets rollbacked to becoming active again, the drainage information that was associated with it will get cleared.
This is required because when this version were to get deactivated again, we need the code to start the drainage process on it again. Right now, it was not doing that since we were not clearing the drainage information which did not start a fresh drainage tracker for this version!

NOTE: This change requires a patch to be present. This is because when this scenario is replicated in production, it will result in two new history calls (syncSummary as well as a CAN of the version workflow). This will specifically happen when the rolled-back version gets deactivated eventually, over here - note how previously, since the drainage information was not cleared, this was never executed!)

How did I verify that this requires a patch:

Ran the replay test script before the fix introduced in this PR and noticed:

Ran the replay test script after the fix introduced in this PR and noticed:

In other words, the total number of our entity workflows went up after introducing this change and thus requires a patch gate!

Why?

we found a bug in the versioning entity workflows and this is an effort to fix this!

How did you test it?

Potential risks

yes, there is a risk of NDE's happening in production but I mitigated that risk by placing the fix against a version gate.

carlydf

Nice job with the repro and the fix!
I know it's annoying, but I really think that we should have solid replay tests for this patch, particularly if it's going into the OSS patch release.

I think it's worth doing the patch test ticket before merging this, so that this can be replay tested

tests/worker_deployment_test.go

service/worker/workerdeployment/version_workflow.go

…nage-sync-bug

- When a deactivated (drained/draining) version gets rollbacked to becoming active again, the drainage information that was associated with it will get cleared. - This is required because when this version were to get deactivated again, we need the code to start the drainage process on it again. Right now, it was not doing that since we were not clearing the drainage information which did not start a fresh drainage tracker for this version! *NOTE*: This change requires a patch to be present. This is because when this scenario is replicated in production, it will result in two new history calls (syncSummary as well as a CAN of the version workflow). This will specifically happen when the rolled-back version gets deactivated eventually, over [here](https://github.com/temporalio/temporal/blob/ada5d0bbee45150888b63a132be8239fd5a87212/service/worker/workerdeployment/version_workflow.go#L546) - note how previously, since the drainage information was not cleared, [this](https://github.com/temporalio/temporal/blob/ada5d0bbee45150888b63a132be8239fd5a87212/service/worker/workerdeployment/version_workflow.go#L261) was never executed!) *How did I verify that this requires a patch*: - Ran the replay test script before the fix introduced in this PR and noticed: <img width="793" height="79" alt="image" src="https://github.com/user-attachments/assets/bb975d26-4ad1-4847-b477-b74c7d2f55e4" /> - Ran the replay test script after the fix introduced in this PR and noticed: <img width="911" height="99" alt="image" src="https://github.com/user-attachments/assets/ad3ff3e1-666a-49b4-bda4-e05f3412d984" /> In other words, the total number of our entity workflows went up after introducing this change and thus requires a patch gate! - we found a bug in the versioning entity workflows and this is an effort to fix this! - [ ] built - [ ] run locally and tested manually - [ ] covered by existing tests - [ ] added new unit test(s) - [x] added new functional test(s) - yes, there is a risk of NDE's happening in production but I mitigated that risk by placing the fix against a version gate.

Shivs11 added 2 commits July 30, 2025 12:28

fix bug

5bfbf3f

remove prints

01b3a7a

Shivs11 requested a review from a team as a code owner July 30, 2025 16:37

Shivs11 requested a review from carlydf July 30, 2025 16:39

carlydf reviewed Jul 30, 2025

View reviewed changes

tests/worker_deployment_test.go Show resolved Hide resolved

service/worker/workerdeployment/version_workflow.go Outdated Show resolved Hide resolved

Shivs11 and others added 14 commits July 30, 2025 18:14

a better replay tester

d80eb42

restore worker code

bff490a

restore old helpful comment

bff6f0f

restore worker.go

e02ff95

fresh run of the workflow

13b2018

updated run directory

be67d92

cursor bugbot comments

975e3a6

replay tester also accounts for the number of workflow executions

fef2542

comment

032a93a

don't save json files

fe1a4e5

Merge branch 'main' into ss/better-replay-testing

5b7a08a

Merge branch 'ss/better-replay-testing' into ss/version-workflow-drai…

1a55c28

…nage-sync-bug

add replay test and address comments

48d9f95

Merge branch 'main' into ss/version-workflow-drainage-sync-bug

bb5b10f

carlydf approved these changes Jul 31, 2025

View reviewed changes

Shivs11 enabled auto-merge (squash) July 31, 2025 19:50

Shivs11 merged commit 9c48ca4 into main Jul 31, 2025
89 of 91 checks passed

Shivs11 deleted the ss/version-workflow-drainage-sync-bug branch July 31, 2025 20:29

Shivs11 added the release/1.28.1 label Aug 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

clear drainage status of a rollbacked version workflow #8119

clear drainage status of a rollbacked version workflow #8119

Uh oh!

Shivs11 commented Jul 30, 2025 •

edited

Loading

Uh oh!

carlydf left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clear drainage status of a rollbacked version workflow #8119

clear drainage status of a rollbacked version workflow #8119

Uh oh!

Conversation

Shivs11 commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed?

Why?

How did you test it?

Potential risks

Uh oh!

carlydf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Shivs11 commented Jul 30, 2025 •

edited

Loading