Skip to content

feat: expose metrics Services and optional ServiceMonitors in the Helm chart #6405

Description

@cgarcia-l

Pre-submission checklist

  • I have searched existing issues and confirmed this is not a duplicate.
  • I understand that submitting a feature request does not guarantee it will be implemented.

Proposed Feature

The Kargo controller, management-controller and webhooks-server can already expose Prometheus metrics by setting METRICS_BIND_ADDRESS, but the Helm chart provides no way to make those metrics scrapeable. None of these components declare a metrics container port, and the chart ships no metrics Service and no ServiceMonitor. Operators currently have to add all of this out of band.

I would like the chart to adopt the same metrics pattern already used by the sibling Argo projects (argo-cd and argo-workflows in argoproj/argo-helm), namely:

  1. A per-component metrics Service gated by a values flag, for example:
controller:
  metrics:
    enabled: false
    service:
      type: ClusterIP
      clusterIP: ""        # "None" for a headless service
      annotations: {}
      labels: {}
      servicePort: 9090
      portName: metrics
    serviceMonitor:
      enabled: false
      interval: 30s
      relabelings: []
      metricRelabelings: []
      additionalLabels: {}
      scheme: ""
      tlsConfig: {}
      namespace: ""
  1. The matching metrics container port declared on the component's pod spec when metrics.enabled is true.

  2. An optional ServiceMonitor gated by both metrics.enabled and metrics.serviceMonitor.enabled, guarded by a Capabilities.APIVersions.Has check so it is skipped on clusters without the Prometheus Operator CRDs.

The same structure would apply to managementController and webhooksServer. All flags default to false, so the change is fully backward compatible.

Motivation and Use Case

We run Kargo on EKS with Grafana Alloy for metrics collection. Because the chart declares no metrics container port and no metrics Service, our annotation based pod discovery cannot scrape the controllers, and we had to add three headless Services and a separate Alloy scrape config ourselves to get any Kargo metrics into Grafana.

This is the standard pattern every other controller-runtime based tool in the ecosystem ships, including Argo CD and Argo Workflows, which live in the same Helm repository and are maintained alongside Kargo. Adopting it would let operators monitor Kargo with a couple of values instead of maintaining bespoke manifests, and would work out of the box for both Prometheus Operator users (via ServiceMonitor) and annotation or Service based scrapers.

Alternatives Considered

  • Adding the metrics Services and ServiceMonitors as raw manifests in a chart wrapper (what we do today). It works but every operator has to reinvent it, and it drifts from the upstream pod labels.
  • Kustomize "last mile" patches over helm template. This still requires each operator to hand write the Service, the container port and the ServiceMonitor, and does not benefit the wider community.

We are happy to contribute the implementation once the chart restructuring referenced in #6250 settles and maintainers are open to chart PRs again.

Metadata

Metadata

Assignees

Labels

needs/areaIssue or PR needs to be labeled to indicate what parts of the code base are affectedneeds/kindIssue or PR needs to be labeled to clarify its natureneeds/priorityPriority has not yet been determined; a good signal that maintainers aren't fully committed

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions