Skip to content

Conversation

@THardy98
Copy link
Contributor

@THardy98 THardy98 commented Nov 19, 2025

What was changed

Add flags to export scenario metrics, export worker metrics, and export failed histories. Exported artifacts are in JSON format. Added options are:

  • --export-scenario-metrics
  • --export-failed-histories
  • --worker-export-metrics

Currently, the client-side metrics (scenario metrics and export failed histories) are only implemented for throughput stress.
The worker option is only implemented for the Go worker.

Options will no-op on different configurations

Why?

Provides ability to persist outcomes/insights from Omes runs.

  1. Closes

  2. How was this tested:

go run ./cmd run-scenario-with-worker \
--language go \
--scenario throughput_stress \
--duration 10m \
--export-scenario-metrics \
--export-failed-histories \
--worker-export-metrics

and

go run ./cmd run-scenario-with-worker \
--language go \
--scenario throughput_stress \
--duration 10m \
--export-scenario-metrics \
--export-failed-histories \
--run-id test1

with

go run ./cmd run-worker \
--language go \
--run-id test1 \
--worker-export-metrics
  1. Any docs updates needed?
    No

@THardy98 THardy98 requested a review from a team as a code owner November 19, 2025 00:46
@THardy98 THardy98 requested review from Sushisource and stephanos and removed request for a team November 19, 2025 00:47
…port failed histories. Exported artifacts are in JSON format
@THardy98 THardy98 force-pushed the export-metrics-and-failed-workflows branch from 0930588 to 4a4935e Compare November 19, 2025 17:26
@THardy98 THardy98 force-pushed the export-metrics-and-failed-workflows branch from b969524 to a082ddf Compare November 19, 2025 18:31
@THardy98 THardy98 changed the title Add options to export metrics and failed wf histories Add flags to export metrics and failed wf histories Nov 19, 2025
@@ -0,0 +1,192 @@
package resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing feels like more work than necessary. Let's just use Prometheus to record all this and then we can snapshot the DB when it's done. Maybe another option would work too, but let's not write our own custom metrics format and collection

Copy link
Contributor Author

@THardy98 THardy98 Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just use Prometheus to record all this and then we can snapshot the DB when it's done

Could you clarify a bit here?

AFAICT, we don't spin up a prometheus server to scrape/store our metrics. We just expose an endpoint for prometheus metrics to be scraped from. So there is no collection currently, except in-memory on the client.

These metrics are collected on the worker, and it isn't immediately obvious to me how we'd want to send worker metrics over to the client, to expose on its /metrics endpoint (if we want to do that at all), in tandem with running a prometheus server

So I opted to export this as a file

Maybe another option would work too, but let's not write our own custom metrics format

Yeah, fair. I can replace the current resource tracking with prometheus instrumentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that we can just run a local Prom instance in-lieu of running inside actual infra. If/when we run inside full-on infra-provisioned stuff we can just use a real Prom service, but if we don't have one it's very easy to just spin one up locally and export the stuff from that.

We don't need to send metrics from the worker to the client. Both sides can just expose prom metrics for scraping independently and that's fine.

@THardy98
Copy link
Contributor Author

THardy98 commented Nov 24, 2025

Closing in favor of splitting the PRs, given that there will be a notable difference between the metrics in this PR and the prom PR:
export history failures PR
will add prom when up

@THardy98 THardy98 closed this Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants