Skip to content

Conversation

@alexshtin
Copy link
Contributor

@alexshtin alexshtin commented Jun 30, 2020

What changed?
getCauseFailure stops at first ApplicationFailureInfo.

Closes temporalio/sdk-go#175.

Why?
Non-retryable behaviour doesn't work correctly if non-retryable error has retryable error as a cause.

How did you test it?
Added bunch of unit tests to cover all possible cases.

Potential risks
No risks.

@alexshtin alexshtin requested a review from mfateev June 30, 2020 19:47

func getCauseFailure(failure *failurepb.Failure) *failurepb.Failure {
for ; failure.GetCause() != nil; failure = failure.GetCause() {
// Unwrap failures till the first ApplicationFailure because only first ApplicationFailure controls retryable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to unwrap up to the first application failure. We want to get immediate cause and if it is not an application failure then stop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rethink this logic myself: we will unwrap only ChildWorkflowExectutionFailure and ActivityFailure.

}

if failure.GetApplicationFailureInfo() != nil {
if failure.GetApplicationFailureInfo().GetNonRetryable() {
Copy link
Member

@mastermanu mastermanu Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] why the different style between line 106 (returning !failure.GetServerFailureInfo()) vs here (if true then return false)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServerFailure doesn't have type field. Therefore retryable is controlled only by NonRetryable flag (if it is true, IsRetryable is false and vice versa). ApplicationFailure also has a type field which needs to be checked against nonRetryableTypes. So if NonRetryable is set we can return right away. If not, we need to check type.

}

if failure.GetTimeoutFailureInfo() != nil {
if failure.GetTimeoutFailureInfo().GetTimeoutType() != enumspb.TIMEOUT_TYPE_START_TO_CLOSE &&
Copy link
Member

@mastermanu mastermanu Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] can we just do return TimeoutType() == START_TO_CLOSE || TimeoutType() == TIMEOUT_TYPE_HEARTBEAT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am so used to golang not having ?: operator, so I forgot that such returns are actually possible.


func getCauseFailure(failure *failurepb.Failure) *failurepb.Failure {
for ; failure.GetCause() != nil; failure = failure.GetCause() {
// Extract cause for ChildWorkflowExecutionFailure and ActivityFailure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand this - the intent here is to keep going until we hit the innermost childworkflowexecutionfailure or activityfailureinfo, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is vice versa. We moving "down" the chain while failure type is child execution of activity and there is an inner cause. As soon as we got something different from child execution or activity, we stop and return it.

This is actual fix to the problem you reported. Before it goes down the last failure in the chain, which in your case, was generic go error converted to retryable ApplicationFailure.

@alexshtin alexshtin merged commit a756661 into temporalio:master Jul 1, 2020
@alexshtin alexshtin deleted the fix/retry-wrapped-failures branch July 1, 2020 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NewNonRetryableApplicationError not respected by

3 participants