Fix retry for wrapped failures #490

alexshtin · 2020-06-30T19:47:34Z

What changed?
getCauseFailure stops at first ApplicationFailureInfo.

Why?
Non-retryable behaviour doesn't work correctly if non-retryable error has retryable error as a cause.

How did you test it?
Added bunch of unit tests to cover all possible cases.

Potential risks
No risks.

mfateev · 2020-06-30T20:54:49Z

service/history/retry.go


 func getCauseFailure(failure *failurepb.Failure) *failurepb.Failure {
-	for ; failure.GetCause() != nil; failure = failure.GetCause() {
+	// Unwrap failures till the first ApplicationFailure because only first ApplicationFailure controls retryable.


I don't think we want to unwrap up to the first application failure. We want to get immediate cause and if it is not an application failure then stop.

I rethink this logic myself: we will unwrap only ChildWorkflowExectutionFailure and ActivityFailure.

mastermanu · 2020-06-30T21:41:01Z

service/history/retry.go

+	}
+
 	if failure.GetApplicationFailureInfo() != nil {
+		if failure.GetApplicationFailureInfo().GetNonRetryable() {


[Nit] why the different style between line 106 (returning !failure.GetServerFailureInfo()) vs here (if true then return false)?

ServerFailure doesn't have type field. Therefore retryable is controlled only by NonRetryable flag (if it is true, IsRetryable is false and vice versa). ApplicationFailure also has a type field which needs to be checked against nonRetryableTypes. So if NonRetryable is set we can return right away. If not, we need to check type.

mastermanu · 2020-06-30T21:44:27Z

service/history/retry.go

-	}
-
 	if failure.GetTimeoutFailureInfo() != nil {
 		if failure.GetTimeoutFailureInfo().GetTimeoutType() != enumspb.TIMEOUT_TYPE_START_TO_CLOSE &&


[Nit] can we just do return TimeoutType() == START_TO_CLOSE || TimeoutType() == TIMEOUT_TYPE_HEARTBEAT

I am so used to golang not having ?: operator, so I forgot that such returns are actually possible.

mastermanu · 2020-06-30T21:53:23Z

service/history/retry.go


 func getCauseFailure(failure *failurepb.Failure) *failurepb.Failure {
-	for ; failure.GetCause() != nil; failure = failure.GetCause() {
+	// Extract cause for ChildWorkflowExecutionFailure and ActivityFailure.


Just to make sure I understand this - the intent here is to keep going until we hit the innermost childworkflowexecutionfailure or activityfailureinfo, right?

No, it is vice versa. We moving "down" the chain while failure type is child execution of activity and there is an inner cause. As soon as we got something different from child execution or activity, we stop and return it.

This is actual fix to the problem you reported. Before it goes down the last failure in the chain, which in your case, was generic go error converted to retryable ApplicationFailure.

service/history/retry_test.go

Fix retry for wrapped failures.

50b46a8

alexshtin requested a review from mfateev June 30, 2020 19:47

mfateev requested changes Jun 30, 2020

View reviewed changes

Change unwrap logic.

4c8c2ab

mastermanu reviewed Jun 30, 2020

View reviewed changes

alexshtin added 3 commits June 30, 2020 15:04

Minor fix.

f7f6621

Minor fix.

9bcec7d

Minor fix.

419ceeb

mfateev approved these changes Jul 1, 2020

View reviewed changes

service/history/retry_test.go Show resolved Hide resolved

Address feedback.

5e31ac9

alexshtin merged commit a756661 into temporalio:master Jul 1, 2020

alexshtin deleted the fix/retry-wrapped-failures branch July 1, 2020 04:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix retry for wrapped failures #490

Fix retry for wrapped failures #490

Uh oh!

alexshtin commented Jun 30, 2020 •

edited

Loading

Uh oh!

mfateev Jun 30, 2020

Uh oh!

alexshtin Jun 30, 2020

Uh oh!

mastermanu Jun 30, 2020 •

edited by alexshtin

Loading

Uh oh!

alexshtin Jun 30, 2020

Uh oh!

mastermanu Jun 30, 2020 •

edited by alexshtin

Loading

Uh oh!

alexshtin Jun 30, 2020

Uh oh!

mastermanu Jun 30, 2020

Uh oh!

alexshtin Jun 30, 2020

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix retry for wrapped failures #490

Fix retry for wrapped failures #490

Uh oh!

Conversation

alexshtin commented Jun 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfateev Jun 30, 2020

Choose a reason for hiding this comment

Uh oh!

alexshtin Jun 30, 2020

Choose a reason for hiding this comment

Uh oh!

mastermanu Jun 30, 2020 • edited by alexshtin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexshtin Jun 30, 2020

Choose a reason for hiding this comment

Uh oh!

mastermanu Jun 30, 2020 • edited by alexshtin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexshtin Jun 30, 2020

Choose a reason for hiding this comment

Uh oh!

mastermanu Jun 30, 2020

Choose a reason for hiding this comment

Uh oh!

alexshtin Jun 30, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexshtin commented Jun 30, 2020 •

edited

Loading

mastermanu Jun 30, 2020 •

edited by alexshtin

Loading

mastermanu Jun 30, 2020 •

edited by alexshtin

Loading