Skip to content

Conversation

@dandavison
Copy link
Contributor

@dandavison dandavison commented Oct 29, 2025

What changed?

  • add history.ChasmNotifier based on the events.Notifier implementation used for workflows
  • implement chasm.PollComponent
  • implement PollActivityExecution API handler

Why?

  • needed for standalone activity
  • needed for other situations where we long-poll a CHASM execution/entity

How did you test it?

  • built
  • run locally and tested manually (TODO: we are set up to do this with the Python prototype)
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Note

Adds CHASM long-polling via PollComponent and a new PollActivityExecution API, powered by a new execution notifier, with protobufs, frontend/handler wiring, and comprehensive tests.

  • CHASM Core:
    • Engine API: Simplify Engine.PollComponent to a monotonic predicate func(Context, Component) (bool, error) and return updated []byte ref; update wrappers/mocks.
    • Long-poll Implementation: Add ChasmEngine.PollComponent with internal long-poll timeout, notifier subscription, stale-state handling, and predicateSatisfied helper.
    • Utilities: Add ExecutionStateChanged and ErrInvalidComponentRefBytes in transition_history.go.
  • History Service:
    • Notifier: Introduce ChasmNotifier (subscribe/notify per execution); wire via FX and expose through ChasmEngine and historyEngine (GetChasmEngine, NotifyChasmExecution).
    • Notifications on Persistence: Trigger CHASM notifications from workflow transaction updates/creations when CHASM nodes change.
  • Activity API:
    • New RPCs/Protos: Add PollActivityExecution request/response messages and service method; generate client and gRPC stubs.
    • Frontend/Handler: Validate requests; implement PollActivityExecution using ReadComponent or PollComponent per wait policy; build responses including info/input/outcome.
    • Activity Model: Add buildActivityExecutionInfo and response builder; adjust outcome handling; validator for PollActivityExecution.
  • Tests:
    • Add unit tests for notifier and CHASM engine (poll success, wait, stale, not-found).
    • Add functional tests for PollActivityExecution (no-wait, wait-any-state-change, deadline behaviors, invalid args, not found) and start-to-close timeout behavior.

Written by Cursor Bugbot for commit 1ad72b5. This will update automatically on new commits. Configure here.

Base automatically changed from update-protos to standalone-activity October 30, 2025 20:25
@dandavison dandavison force-pushed the poll-component branch 10 times, most recently from 3281080 to 9c65e47 Compare November 11, 2025 03:18
@dandavison dandavison force-pushed the poll-component branch 11 times, most recently from ce95cbe to 29a0bd9 Compare November 12, 2025 18:32
@dandavison dandavison marked this pull request as ready for review November 12, 2025 18:51
@dandavison dandavison requested review from a team as code owners November 12, 2025 18:51
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Start-to-close timeout failure information is lost

When an activity exhausts retries due to start-to-close timeout, recordStartToCloseTimedOut sets outcome.Variant to an empty ActivityOutcome_Failed_{} struct with a nil Failed field. This prevents the fallback logic in buildPollActivityExecutionResponse (lines 433-449) from executing, since it only runs when activityOutcome is nil. The failure information stored in attempt.GetLastFailureDetails() is never retrieved, causing timeout failure details to be lost in API responses. The outcome should either remain nil to allow the fallback, or the Failed field should be populated with the actual failure from attempt state.

chasm/lib/activity/activity.go#L285-L286

// If the activity has exhausted retries, mark the outcome failure as well but don't store duplicate failure info.

chasm/lib/activity/activity.go#L421-L424

if activityOutcome != nil {
switch v := activityOutcome.GetVariant().(type) {
case *activitypb.ActivityOutcome_Failed_:
response.Outcome = &workflowservice.PollActivityExecutionResponse_Failure{

Fix in Cursor Fix in Web


EntityID: newWorkflowSnapshot.ExecutionState.RunId,
}, nil)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Duplicate workflow notification calls in UpdateWorkflowExecution

The UpdateWorkflowExecution method contains nested duplicate conditional blocks checking persistence.OperationPossiblySucceeded(err). Inside the outer check, NotifyWorkflowMutationTasks and NotifyWorkflowSnapshotTasks are called twice unnecessarily. This causes workflow task notifications to be sent twice when the operation succeeds, potentially leading to duplicate task processing or other unintended side effects.

Fix in Cursor Fix in Web

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Empty outcome failure for start-to-close timeout

When recording a start-to-close timeout with no retries left, outcome.Variant is set to an empty ActivityOutcome_Failed_{} without populating the actual failure details. This causes buildPollActivityExecutionResponse to return a nil failure when accessed via v.Failed.GetFailure(), preventing clients from receiving timeout failure information. The failure should be populated with the timeout failure created earlier in the function, similar to how recordFromScheduledTimeOut populates it.

chasm/lib/activity/activity.go#L288-L289

if noRetriesLeft {
outcome.Variant = &activitypb.ActivityOutcome_Failed_{}

Fix in Cursor Fix in Web


EntityID: newWorkflowSnapshot.ExecutionState.RunId,
}, nil)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Duplicate notification code in UpdateWorkflowExecution

The UpdateWorkflowExecution method contains duplicate nested if persistence.OperationPossiblySucceeded(err) blocks starting at line 191. This causes NotifyWorkflowMutationTasks, NotifyWorkflowSnapshotTasks, and NotifyChasmExecution to be called twice for the same execution updates, potentially leading to duplicate task processing, redundant notifications to long-poll subscribers, and wasted resources.

Fix in Cursor Fix in Web

@dandavison dandavison force-pushed the poll-component branch 2 times, most recently from 8f24888 to 957b571 Compare November 25, 2025 19:35
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Start-to-close timeout outcome missing failure details

When recordStartToCloseTimedOut is called with noRetriesLeft=true, it sets outcome.Variant = &activitypb.ActivityOutcome_Failed_{} creating an empty Failed struct without populating the Failure field. This contradicts the test expectation at line 326-327 in statemachine_test.go which requires failure.Failed.GetFailure() to be non-nil. The outcome should include the actual failure details (similar to recordFromScheduledTimeOut at lines 244-248) to provide timeout information in API responses. This causes buildPollActivityExecutionResponse at lines 423-426 to return nil for the failure when it should return the timeout failure.

chasm/lib/activity/activity.go#L288-L289

if noRetriesLeft {
outcome.Variant = &activitypb.ActivityOutcome_Failed_{}

Fix in Cursor Fix in Web


Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing failure details in activity timeout outcome

In recordStartToCloseTimedOut, when noRetriesLeft is true, the outcome variant is set to an empty ActivityOutcome_Failed_{} without the actual failure details. This prevents API responses from returning the timeout failure information to clients. The outcome should include the failure that was already created and stored in attempt.LastFailureDetails, similar to how recordFromScheduledTimeOut populates the failure field with the actual timeout information.

chasm/lib/activity/activity.go#L288-L289

if noRetriesLeft {
outcome.Variant = &activitypb.ActivityOutcome_Failed_{}

Fix in Cursor Fix in Web


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants