chasm.PollComponent and PollActivityExecution #8563

dandavison · 2025-10-29T01:13:42Z

What changed?

add history.ChasmNotifier based on the events.Notifier implementation used for workflows
implement chasm.PollComponent
implement PollActivityExecution API handler

Why?

needed for standalone activity
needed for other situations where we long-poll a CHASM execution/entity

How did you test it?

built
run locally and tested manually (TODO: we are set up to do this with the Python prototype)
covered by existing tests
added new unit test(s)
added new functional test(s)

Note

Adds CHASM long-polling via PollComponent and a new PollActivityExecution API, powered by a new execution notifier, with protobufs, frontend/handler wiring, and comprehensive tests.

CHASM Core:
- Engine API: Simplify Engine.PollComponent to a monotonic predicate func(Context, Component) (bool, error) and return updated []byte ref; update wrappers/mocks.
- Long-poll Implementation: Add ChasmEngine.PollComponent with internal long-poll timeout, notifier subscription, stale-state handling, and predicateSatisfied helper.
- Utilities: Add ExecutionStateChanged and ErrInvalidComponentRefBytes in transition_history.go.
History Service:
- Notifier: Introduce ChasmNotifier (subscribe/notify per execution); wire via FX and expose through ChasmEngine and historyEngine (GetChasmEngine, NotifyChasmExecution).
- Notifications on Persistence: Trigger CHASM notifications from workflow transaction updates/creations when CHASM nodes change.
Activity API:
- New RPCs/Protos: Add PollActivityExecution request/response messages and service method; generate client and gRPC stubs.
- Frontend/Handler: Validate requests; implement PollActivityExecution using ReadComponent or PollComponent per wait policy; build responses including info/input/outcome.
- Activity Model: Add buildActivityExecutionInfo and response builder; adjust outcome handling; validator for PollActivityExecution.
Tests:
- Add unit tests for notifier and CHASM engine (poll success, wait, stale, not-found).
- Add functional tests for PollActivityExecution (no-wait, wait-any-state-change, deadline behaviors, invalid args, not found) and start-to-close timeout behavior.

^{Written by Cursor Bugbot for commit 1ad72b5. This will update automatically on new commits. Configure here.}

service/history/chasm_engine.go

This reverts commit f34f222.

cursor

Bug: Start-to-close timeout failure information is lost

When an activity exhausts retries due to start-to-close timeout, recordStartToCloseTimedOut sets outcome.Variant to an empty ActivityOutcome_Failed_{} struct with a nil Failed field. This prevents the fallback logic in buildPollActivityExecutionResponse (lines 433-449) from executing, since it only runs when activityOutcome is nil. The failure information stored in attempt.GetLastFailureDetails() is never retrieved, causing timeout failure details to be lost in API responses. The outcome should either remain nil to allow the fallback, or the Failed field should be populated with the actual failure from attempt state.

chasm/lib/activity/activity.go#L285-L286

temporal/chasm/lib/activity/activity.go

Lines 285 to 286 in e5d900b

    
           // If the activity has exhausted retries, mark the outcome failure as well but don't store duplicate failure info.

chasm/lib/activity/activity.go#L421-L424

temporal/chasm/lib/activity/activity.go

Lines 421 to 424 in e5d900b

    
           if activityOutcome != nil { 
        
           	switch v := activityOutcome.GetVariant().(type) { 
        
           	case *activitypb.ActivityOutcome_Failed_: 
        
           		response.Outcome = &workflowservice.PollActivityExecutionResponse_Failure{

cursor · 2025-11-25T01:56:41Z

service/history/workflow/transaction_impl.go

+					EntityID:    newWorkflowSnapshot.ExecutionState.RunId,
+				}, nil)
+			}
+		}


Bug: Duplicate workflow notification calls in UpdateWorkflowExecution

The UpdateWorkflowExecution method contains nested duplicate conditional blocks checking persistence.OperationPossiblySucceeded(err). Inside the outer check, NotifyWorkflowMutationTasks and NotifyWorkflowSnapshotTasks are called twice unnecessarily. This causes workflow task notifications to be sent twice when the operation succeeds, potentially leading to duplicate task processing or other unintended side effects.

cursor

Bug: Empty outcome failure for start-to-close timeout

When recording a start-to-close timeout with no retries left, outcome.Variant is set to an empty ActivityOutcome_Failed_{} without populating the actual failure details. This causes buildPollActivityExecutionResponse to return a nil failure when accessed via v.Failed.GetFailure(), preventing clients from receiving timeout failure information. The failure should be populated with the timeout failure created earlier in the function, similar to how recordFromScheduledTimeOut populates it.

chasm/lib/activity/activity.go#L288-L289

temporal/chasm/lib/activity/activity.go

Lines 288 to 289 in 95371e9

    
           if noRetriesLeft { 
        
           	outcome.Variant = &activitypb.ActivityOutcome_Failed_{}

cursor · 2025-11-25T02:26:30Z

service/history/workflow/transaction_impl.go

+					EntityID:    newWorkflowSnapshot.ExecutionState.RunId,
+				}, nil)
+			}
+		}


Bug: Duplicate notification code in UpdateWorkflowExecution

The UpdateWorkflowExecution method contains duplicate nested if persistence.OperationPossiblySucceeded(err) blocks starting at line 191. This causes NotifyWorkflowMutationTasks, NotifyWorkflowSnapshotTasks, and NotifyChasmExecution to be called twice for the same execution updates, potentially leading to duplicate task processing, redundant notifications to long-poll subscribers, and wasted resources.

cursor

Bug: Start-to-close timeout outcome missing failure details

When recordStartToCloseTimedOut is called with noRetriesLeft=true, it sets outcome.Variant = &activitypb.ActivityOutcome_Failed_{} creating an empty Failed struct without populating the Failure field. This contradicts the test expectation at line 326-327 in statemachine_test.go which requires failure.Failed.GetFailure() to be non-nil. The outcome should include the actual failure details (similar to recordFromScheduledTimeOut at lines 244-248) to provide timeout information in API responses. This causes buildPollActivityExecutionResponse at lines 423-426 to return nil for the failure when it should return the timeout failure.

chasm/lib/activity/activity.go#L288-L289

temporal/chasm/lib/activity/activity.go

Lines 288 to 289 in 957b571

    
           if noRetriesLeft { 
        
           	outcome.Variant = &activitypb.ActivityOutcome_Failed_{}

cursor

Bug: Missing failure details in activity timeout outcome

In recordStartToCloseTimedOut, when noRetriesLeft is true, the outcome variant is set to an empty ActivityOutcome_Failed_{} without the actual failure details. This prevents API responses from returning the timeout failure information to clients. The outcome should include the failure that was already created and stored in attempt.LastFailureDetails, similar to how recordFromScheduledTimeOut populates the failure field with the actual timeout information.

chasm/lib/activity/activity.go#L288-L289

temporal/chasm/lib/activity/activity.go

Lines 288 to 289 in e4dd35d

    
           if noRetriesLeft { 
        
           	outcome.Variant = &activitypb.ActivityOutcome_Failed_{}

dandavison force-pushed the update-protos branch from cb5f326 to 21cdb37 Compare October 30, 2025 20:00

Base automatically changed from update-protos to standalone-activity October 30, 2025 20:25

dandavison force-pushed the standalone-activity branch from ded2e1b to efc4722 Compare October 30, 2025 20:49

dandavison force-pushed the poll-component branch 10 times, most recently from 3281080 to 9c65e47 Compare November 11, 2025 03:18

dandavison force-pushed the poll-component branch 11 times, most recently from ce95cbe to 29a0bd9 Compare November 12, 2025 18:32

dandavison marked this pull request as ready for review November 12, 2025 18:51

dandavison requested review from a team as code owners November 12, 2025 18:51

dandavison force-pushed the poll-component branch from 29a0bd9 to 33e981e Compare November 12, 2025 19:01

cursor bot reviewed Nov 12, 2025

View reviewed changes

service/history/chasm_engine.go Show resolved Hide resolved

dandavison force-pushed the poll-component branch from 40e0b6b to 28665bb Compare November 12, 2025 20:06

dandavison added 14 commits November 24, 2025 20:53

Cleanup test

ddff1b5

Delete cross-component polling tests

dc248f4

Use testcore.RandomizeStr(t.Name())

1b46b23

Check RunID

c7f4ab0

code golf

270c13b

Revert "- PS"

fbc77f3

This reverts commit f34f222.

monotonicPredicateFn

e3d74a7

Tests

1f169fb

New deadline logic

2f10ee2

Validation & error messages fixes

0dda6cc

Fix CHASM not found errors

d3c709c

Test Outcome

1e5ca51

Evove tests

36fb610

Move StaleState test to unit tests

e5d900b

dandavison force-pushed the poll-component branch from e9a1223 to e5d900b Compare November 25, 2025 01:54

cursor bot reviewed Nov 25, 2025

View reviewed changes

Rename

07cc8cf

cursor bot reviewed Nov 25, 2025

View reviewed changes

dandavison added 4 commits November 25, 2025 06:36

Cleanup

b7e5c98

Revert addition of new metrics

5cb1637

Subscribe does not return an error

561b090

Fix memory leak

8087fa5

dandavison force-pushed the poll-component branch 2 times, most recently from 8f24888 to 957b571 Compare November 25, 2025 19:35

cursor bot reviewed Nov 25, 2025

View reviewed changes

Clean up, fix validation

e4dd35d

dandavison force-pushed the poll-component branch from 957b571 to e4dd35d Compare November 25, 2025 22:58

cursor bot reviewed Nov 25, 2025

View reviewed changes

dandavison added 2 commits November 25, 2025 18:08

Test absent RunID

1e43970

Don't return unused ref

1ad72b5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chasm.PollComponent and PollActivityExecution #8563

chasm.PollComponent and PollActivityExecution #8563

dandavison commented Oct 29, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Nov 25, 2025

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Nov 25, 2025

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


	// If the activity has exhausted retries, mark the outcome failure as well but don't store duplicate failure info.

	if activityOutcome != nil {
	switch v := activityOutcome.GetVariant().(type) {
	case *activitypb.ActivityOutcome_Failed_:
	response.Outcome = &workflowservice.PollActivityExecutionResponse_Failure{

	if noRetriesLeft {
	outcome.Variant = &activitypb.ActivityOutcome_Failed_{}

chasm.PollComponent and PollActivityExecution #8563

Are you sure you want to change the base?

chasm.PollComponent and PollActivityExecution #8563

Conversation

dandavison commented Oct 29, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed?

Why?

How did you test it?

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Start-to-close timeout failure information is lost

Uh oh!

cursor bot Nov 25, 2025

Choose a reason for hiding this comment

Bug: Duplicate workflow notification calls in UpdateWorkflowExecution

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Empty outcome failure for start-to-close timeout

Uh oh!

cursor bot Nov 25, 2025

Choose a reason for hiding this comment

Bug: Duplicate notification code in UpdateWorkflowExecution

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Start-to-close timeout outcome missing failure details

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Missing failure details in activity timeout outcome

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dandavison commented Oct 29, 2025 •

edited by cursor bot

Loading