-
Notifications
You must be signed in to change notification settings - Fork 1.2k
chasm.PollComponent and PollActivityExecution #8563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: standalone-activity
Are you sure you want to change the base?
Conversation
cb5f326 to
21cdb37
Compare
ded2e1b to
efc4722
Compare
3281080 to
9c65e47
Compare
ce95cbe to
29a0bd9
Compare
29a0bd9 to
33e981e
Compare
40e0b6b to
28665bb
Compare
This reverts commit f34f222.
e9a1223 to
e5d900b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Start-to-close timeout failure information is lost
When an activity exhausts retries due to start-to-close timeout, recordStartToCloseTimedOut sets outcome.Variant to an empty ActivityOutcome_Failed_{} struct with a nil Failed field. This prevents the fallback logic in buildPollActivityExecutionResponse (lines 433-449) from executing, since it only runs when activityOutcome is nil. The failure information stored in attempt.GetLastFailureDetails() is never retrieved, causing timeout failure details to be lost in API responses. The outcome should either remain nil to allow the fallback, or the Failed field should be populated with the actual failure from attempt state.
chasm/lib/activity/activity.go#L285-L286
temporal/chasm/lib/activity/activity.go
Lines 285 to 286 in e5d900b
| // If the activity has exhausted retries, mark the outcome failure as well but don't store duplicate failure info. |
chasm/lib/activity/activity.go#L421-L424
temporal/chasm/lib/activity/activity.go
Lines 421 to 424 in e5d900b
| if activityOutcome != nil { | |
| switch v := activityOutcome.GetVariant().(type) { | |
| case *activitypb.ActivityOutcome_Failed_: | |
| response.Outcome = &workflowservice.PollActivityExecutionResponse_Failure{ |
| EntityID: newWorkflowSnapshot.ExecutionState.RunId, | ||
| }, nil) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Duplicate workflow notification calls in UpdateWorkflowExecution
The UpdateWorkflowExecution method contains nested duplicate conditional blocks checking persistence.OperationPossiblySucceeded(err). Inside the outer check, NotifyWorkflowMutationTasks and NotifyWorkflowSnapshotTasks are called twice unnecessarily. This causes workflow task notifications to be sent twice when the operation succeeds, potentially leading to duplicate task processing or other unintended side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Empty outcome failure for start-to-close timeout
When recording a start-to-close timeout with no retries left, outcome.Variant is set to an empty ActivityOutcome_Failed_{} without populating the actual failure details. This causes buildPollActivityExecutionResponse to return a nil failure when accessed via v.Failed.GetFailure(), preventing clients from receiving timeout failure information. The failure should be populated with the timeout failure created earlier in the function, similar to how recordFromScheduledTimeOut populates it.
chasm/lib/activity/activity.go#L288-L289
temporal/chasm/lib/activity/activity.go
Lines 288 to 289 in 95371e9
| if noRetriesLeft { | |
| outcome.Variant = &activitypb.ActivityOutcome_Failed_{} |
| EntityID: newWorkflowSnapshot.ExecutionState.RunId, | ||
| }, nil) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Duplicate notification code in UpdateWorkflowExecution
The UpdateWorkflowExecution method contains duplicate nested if persistence.OperationPossiblySucceeded(err) blocks starting at line 191. This causes NotifyWorkflowMutationTasks, NotifyWorkflowSnapshotTasks, and NotifyChasmExecution to be called twice for the same execution updates, potentially leading to duplicate task processing, redundant notifications to long-poll subscribers, and wasted resources.
8f24888 to
957b571
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Start-to-close timeout outcome missing failure details
When recordStartToCloseTimedOut is called with noRetriesLeft=true, it sets outcome.Variant = &activitypb.ActivityOutcome_Failed_{} creating an empty Failed struct without populating the Failure field. This contradicts the test expectation at line 326-327 in statemachine_test.go which requires failure.Failed.GetFailure() to be non-nil. The outcome should include the actual failure details (similar to recordFromScheduledTimeOut at lines 244-248) to provide timeout information in API responses. This causes buildPollActivityExecutionResponse at lines 423-426 to return nil for the failure when it should return the timeout failure.
chasm/lib/activity/activity.go#L288-L289
temporal/chasm/lib/activity/activity.go
Lines 288 to 289 in 957b571
| if noRetriesLeft { | |
| outcome.Variant = &activitypb.ActivityOutcome_Failed_{} |
957b571 to
e4dd35d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing failure details in activity timeout outcome
In recordStartToCloseTimedOut, when noRetriesLeft is true, the outcome variant is set to an empty ActivityOutcome_Failed_{} without the actual failure details. This prevents API responses from returning the timeout failure information to clients. The outcome should include the failure that was already created and stored in attempt.LastFailureDetails, similar to how recordFromScheduledTimeOut populates the failure field with the actual timeout information.
chasm/lib/activity/activity.go#L288-L289
temporal/chasm/lib/activity/activity.go
Lines 288 to 289 in e4dd35d
| if noRetriesLeft { | |
| outcome.Variant = &activitypb.ActivityOutcome_Failed_{} |
What changed?
history.ChasmNotifierbased on theevents.Notifierimplementation used for workflowschasm.PollComponentPollActivityExecutionAPI handlerWhy?
How did you test it?
Note
Adds CHASM long-polling via PollComponent and a new PollActivityExecution API, powered by a new execution notifier, with protobufs, frontend/handler wiring, and comprehensive tests.
Engine.PollComponentto a monotonic predicatefunc(Context, Component) (bool, error)and return updated[]byteref; update wrappers/mocks.ChasmEngine.PollComponentwith internal long-poll timeout, notifier subscription, stale-state handling, andpredicateSatisfiedhelper.ExecutionStateChangedandErrInvalidComponentRefBytesintransition_history.go.ChasmNotifier(subscribe/notify per execution); wire via FX and expose throughChasmEngineandhistoryEngine(GetChasmEngine,NotifyChasmExecution).PollActivityExecutionrequest/response messages and service method; generate client and gRPC stubs.PollActivityExecutionusingReadComponentorPollComponentper wait policy; build responses including info/input/outcome.buildActivityExecutionInfoand response builder; adjust outcome handling; validator for PollActivityExecution.PollActivityExecution(no-wait, wait-any-state-change, deadline behaviors, invalid args, not found) and start-to-close timeout behavior.Written by Cursor Bugbot for commit 1ad72b5. This will update automatically on new commits. Configure here.