-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix archival activities error handling #3227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if !common.IsPersistenceTransientError(err) { | ||
|
|
||
| if _, ok := err.(*serviceerror.WorkflowNotReady); !ok { | ||
| logger := tagLoggerWithHistoryRequest(tagLoggerWithActivityInfo(container.Logger, activity.GetInfo(ctx)), &request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppress logs for notReady error which is expected for deleteWorkflowExecution call. The error will still be logged if all retry fails.
| err = workflow.ExecuteLocalActivity(localActCtx, deleteHistoryActivity, *request).Get(localActCtx, nil) | ||
| if err != nil { | ||
| logger.Error("deleting history failed, this means zombie histories are left", tag.Error(err)) | ||
| logger.Error("deleting workflow execution failed all retires, skip workflow deletion", tag.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the workflow data stay in Db if it just skip the deletion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Archival workflow will only retry up to 5min for both uploading and deletion and then give up. This is the limitation of the existing archival design. The issue will be gone once we have a separate archival queue.
For now, user should monitor the metrics for archival delete non-retryable error and use admin wf del command to manually delete those workflows from DB.
| } else { | ||
| err = temporal.NewApplicationError(err.Error(), "", nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is default error to ApplicationError conversion. I would just return err here.
| } else { | |
| err = temporal.NewApplicationError(err.Error(), "", nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And everywhere bellow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Then no change is needed to err and now the defer looks like the following:
defer func() {
sw.Stop()
if err == errUploadNonRetryable {
scope.IncCounter(metrics.ArchiverNonRetryableErrorCount)
}
}()
97bbac9 to
706181f
Compare
What changed?
Why?
How did you test it?
Potential risks
Is hotfix candidate?