Skip to content

Conversation

@simvlad
Copy link
Contributor

@simvlad simvlad commented Aug 14, 2025

What changed?

Changes activity metrics

Deprecated Metrics

  • activity_e2e_latency → Deprecated, as it measures individual attempts rather than true end-to-end latency, and replaced with clearer activity_start_to_close_latency

New & Replacement Metrics

  • activity_start_to_close_latency: Measures the latency from activity start to close (per attempt).
  • activity_schedule_to_close_latency: Measures true end-to-end duration, including retries and backoff.
  • activity_success: Counts the number of succeeded activities. Aligned with the existing workflow_success counter.
  • activity_fail: Counts final failures for activities. Similar to workflow_failed, although it doesn’t include retries.
  • activity_timeout: Incremented on the final activity timeout (including ScheduleToStartTimeout), tagged by timeout_type. Aligned with the existing workflow_timeout counter.
  • activity_task_fail: Counts failures for activities including retries. Note that we don’t need to capture the number of retries, as this metric represents this number well.
  • activity_task_timeout: Incremented on the activity attempt timeout (including ScheduleToStartTimeout), tagged by timeout_type.
  • activity_cancel: Incremented when an activity is cancelled ****

Why?

ActivityE2ELatency is inaccurate, since it measures only individual attempts. The other metrics are not recorded yet.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Ran the server locally and checked the metrics in Prometheus:

Test 1: Activity with 5 attempts (1s each, 1s gap, all time out)

  • activity_task_timeout = 5
  • activity_timeout = 1
  • activity_success = 0
  • activity_fail = 0 (counted as timeout instead)
  • activity_task_fail = 0 (counted as timeout instead)
  • activity_schedule_to_close_latency_sum = 9s
  • activity_end_to_end_latency_sum = 1s (deprecated)
  • activity_start_to_close_latency_sum = 1s

Test 2: Basic helloworld activity

  • activity_success = 1

Test 3: Activity with 5 attempts (1s each, 1s gap, all fail except for the last one)

  • activity_task_fail = 4
  • activity_success = 1

@simvlad simvlad force-pushed the simvlad/activity-duration-metrics branch from cfd3762 to 734a46f Compare August 18, 2025 18:49
@simvlad simvlad marked this pull request as ready for review August 18, 2025 18:51
@simvlad simvlad requested a review from a team as a code owner August 18, 2025 18:51
@simvlad simvlad requested review from nishkrishnan and yycptt August 18, 2025 18:52
@simvlad simvlad force-pushed the simvlad/activity-duration-metrics branch from 734a46f to b44b703 Compare August 21, 2025 06:34
@simvlad simvlad marked this pull request as draft August 21, 2025 06:34
@simvlad simvlad changed the title Record activity duration and status counts Record various activity metrics Aug 21, 2025
@simvlad simvlad marked this pull request as ready for review August 21, 2025 17:47
@simvlad simvlad requested a review from bergundy August 21, 2025 17:53
@simvlad simvlad requested a review from nishkrishnan August 22, 2025 21:16
@simvlad simvlad force-pushed the simvlad/activity-duration-metrics branch from e65f2d8 to 2a7a280 Compare August 27, 2025 23:09
@simvlad simvlad requested a review from yycptt August 27, 2025 23:10
@simvlad simvlad force-pushed the simvlad/activity-duration-metrics branch from 2a7a280 to 1aab8e1 Compare August 28, 2025 21:24
@simvlad simvlad merged commit 5bd927c into main Aug 28, 2025
94 of 97 checks passed
@simvlad simvlad deleted the simvlad/activity-duration-metrics branch August 28, 2025 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants