Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
9c8643e
Fix silent drop of ztunnel counter metrics in Istio ambient mode
AAraKKe May 14, 2026
f0282c7
Add changelog entry
AAraKKe May 14, 2026
ee5845f
Make ambient e2e teardown robust to missing kubectl.pid
AAraKKe May 14, 2026
8155772
Use a Service for ztunnel port-forward instead of a dynamic pod
AAraKKe May 14, 2026
3bbc85c
Create ztunnel Service via manifest, not kubectl expose
AAraKKe May 14, 2026
60c31a0
Restructure matrix orthogonally: version x mode
AAraKKe May 14, 2026
c253565
Use matrix-axis shorthand for ISTIO_MODE env var
AAraKKe May 14, 2026
60b091d
Add Istio 1.24 to the sidecar e2e matrix
AAraKKe May 18, 2026
16f9ab3
Mark Istio galley validation pass/fail metrics intermittent
AAraKKe May 18, 2026
a7fb28c
Mark gc_cpu_fraction intermittent for Istio 1.24
AAraKKe May 18, 2026
14e4774
Mark memstats.lookups intermittent for Istio 1.24
AAraKKe May 18, 2026
73dcd0d
Mark all pilot.conflict variants intermittent
AAraKKe May 18, 2026
dd14e32
Split legacy Go metrics from intermittent in Istio e2e
AAraKKe May 18, 2026
48d97d8
Parse Istio version numerically for legacy check
AAraKKe May 18, 2026
e4f3f09
Use packaging.version for Istio version comparison
AAraKKe May 18, 2026
f6d4550
Address PR review feedback
AAraKKe May 18, 2026
b646e32
Fix ztunnel poll target and add Istio 1.29 to the matrix
AAraKKe May 18, 2026
4ddccd4
Address round-2 review feedback
AAraKKe May 18, 2026
274de78
Address round-3 review feedback
AAraKKe May 18, 2026
e1a4436
Address round-4 review feedback
AAraKKe May 18, 2026
e0e6197
Rename mechanical ztunnel .total counters to .count
AAraKKe May 18, 2026
d60db05
Realign ztunnel metric registration with real exposition
AAraKKe May 18, 2026
522a9d7
Tighten prose hygiene and warning-silence regression
AAraKKe May 18, 2026
e812a1a
Collapse prose comments per AGENTS.md
AAraKKe May 18, 2026
5730755
Expand changelog to cover full ambient mode fix
AAraKKe May 18, 2026
dbe9ab0
Consolidate ambient changelog into a single fix entry
AAraKKe May 18, 2026
3f0d411
Tighten changelog wording
AAraKKe May 18, 2026
09e868a
Drop dead istio_connection_* registrations from ZTUNNEL_METRICS
AAraKKe May 19, 2026
899a174
Document ambient mode and mark its settings fleet-configurable
AAraKKe May 19, 2026
faa090d
Correct README: one ambient instance scrapes all three endpoints
AAraKKe May 19, 2026
0d773a9
Drop hard-coded ports from ambient README prose
AAraKKe May 19, 2026
25dce54
Address documentation review feedback on ambient README section
AAraKKe May 19, 2026
50f5109
Restore license headers on regenerated config_models
AAraKKe May 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions istio/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,25 @@ This annotation specifies the container `discovery` to match the default contain

The method for applying these annotations varies depending on the [Istio deployment strategy (Istioctl, Helm, Operator)][22] used. Consult the Istio documentation for the proper method to apply these pod annotations. See the [sample istio.d/conf.yaml][8] for all available configuration options.

##### Ambient mode configuration

Istio ambient mode, generally available in Istio v1.24, replaces sidecar injection with two shared components: the `ztunnel` DaemonSet (L4 zero-trust tunneling) and optional `waypoint` proxies (L7 HTTP/gRPC processing). Set `istio_mode: ambient` and configure one or more of `ztunnel_endpoint`, `waypoint_endpoint`, and `istiod_endpoint` on the same instance. The check scrapes each endpoint that is set. Adjust the URLs in the example below to match your cluster's hostnames and ports.

Example static configuration in `istio.d/conf.yaml` covering all three components:

```yaml
init_config:

instances:
- istio_mode: ambient
use_openmetrics: true
ztunnel_endpoint: http://ztunnel.istio-system.svc:15020/stats/prometheus
waypoint_endpoint: http://waypoint.<NAMESPACE>.svc:15020/stats/prometheus
istiod_endpoint: http://istiod.istio-system.svc:15014/metrics
```

Replace `<NAMESPACE>` with the namespace where you ran `istioctl waypoint apply`. Omit `waypoint_endpoint` if you have not deployed a waypoint proxy. The same options can be set via the Autodiscovery annotation syntax shown in the [Control plane configuration](#control-plane-configuration) section above.

#### Disable sidecar injection for Datadog Agent pods

If you are installing the [Datadog Agent in a container][10], Datadog recommends that you first disable Istio's sidecar injection.
Expand Down
3 changes: 3 additions & 0 deletions istio/assets/configuration/spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ files:
Specify the Istio data plane mode to monitor.
- `sidecar`: Monitor Istio sidecar proxies (traditional mode)
- `ambient`: Monitor Istio ambient mode components (ztunnel, waypoint proxies)
fleet_configurable: true
value:
example: sidecar
display_default: sidecar
Expand All @@ -35,6 +36,7 @@ files:
Ztunnel is the L4 proxy that provides zero-trust tunneling and mTLS for ambient mesh.
Only used when `istio_mode` is set to `ambient`.
Ztunnel metrics are exposed on port 15020.
fleet_configurable: true
value:
display_default: null
example: http://ztunnel.istio-system:15020/stats/prometheus
Expand All @@ -45,6 +47,7 @@ files:
Waypoint proxies provide optional L7 processing (HTTP/gRPC traffic management) in ambient mesh.
Only used when `istio_mode` is set to `ambient`.
Waypoint metrics are exposed on port 15020.
fleet_configurable: true
value:
display_default: null
example: http://waypoint.istio-system:15020/stats/prometheus
Expand Down
1 change: 1 addition & 0 deletions istio/changelog.d/23707.fixed
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Restore Istio ambient mode metric collection broken in 9.4.0: ztunnel counters are no longer silently dropped, proxy management metrics use the `workload_manager_*` names ztunnel actually emits, and the missing xDS message counters are now registered.
30 changes: 20 additions & 10 deletions istio/datadog_checks/istio/check.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Licensed under a 3-clause BSD style license (see LICENSE)
from collections import ChainMap

from datadog_checks.base import ConfigurationError, OpenMetricsBaseCheckV2
from datadog_checks.base import ConfigurationError, OpenMetricsBaseCheckV2, is_affirmative
from datadog_checks.base.checks.openmetrics.v2.scraper import OpenMetricsCompatibilityScraper

from .constants import ISTIOD_NAMESPACE
Expand Down Expand Up @@ -72,10 +72,24 @@ def _parse_ambient_config(self, istiod_endpoint, istiod_namespace):
"`ztunnel_endpoint`, `waypoint_endpoint`, or `istiod_endpoint`."
)

# Ztunnel provides L4 TCP metrics for ambient mesh
# Ztunnel uses the modern OpenMetrics counter convention; force the v2 parser so counters are not dropped.
ztunnel_namespace = istiod_namespace + ".ztunnel"
if ztunnel_endpoint:
self.scraper_configs.append(self._generate_config(ztunnel_endpoint, ZTUNNEL_METRICS, ztunnel_namespace))
if not is_affirmative(self.instance.get("use_latest_spec", True)):
self.log.warning(
"`use_latest_spec: false` is set with `ztunnel_endpoint` configured. "
"ztunnel emits the modern OpenMetrics counter convention which the "
"legacy parser silently drops, so every ztunnel counter metric will be "
"missed. Remove `use_latest_spec: false` to restore ztunnel metrics."
)
self.scraper_configs.append(
self._generate_config(
ztunnel_endpoint,
ZTUNNEL_METRICS,
ztunnel_namespace,
scraper_defaults={'use_latest_spec': True},
)
)

# Waypoint provides L7 HTTP/gRPC metrics (optional in ambient mode)
waypoint_namespace = istiod_namespace + ".waypoint"
Expand All @@ -86,16 +100,12 @@ def _parse_ambient_config(self, istiod_endpoint, istiod_namespace):
if istiod_endpoint:
self.scraper_configs.append(self._generate_config(istiod_endpoint, ISTIOD_METRICS, istiod_namespace))

def _generate_config(self, endpoint, metrics, namespace):
def _generate_config(self, endpoint, metrics, namespace, *, scraper_defaults=None):
metrics = construct_metrics_config(metrics)
metrics.append(ISTIOD_VERSION)
config = {
'openmetrics_endpoint': endpoint,
'metrics': metrics,
'namespace': namespace,
}
config = {**(scraper_defaults or {}), 'openmetrics_endpoint': endpoint, 'metrics': metrics}
# Instance keys override scraper_defaults; per-scraper namespace is restored on the next line.
config.update(self.instance)
# Restore per-scraper namespace so custom ztunnel/waypoint/mesh namespaces are not overwritten by instance
config['namespace'] = namespace
return config

Expand Down
17 changes: 8 additions & 9 deletions istio/datadog_checks/istio/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -276,17 +276,16 @@
'istio_dns_upstream_failures_total': 'dns.upstream_failures.total',
'istio_dns_upstream_request_duration_seconds': 'dns.upstream_request_duration_seconds',
'istio_on_demand_dns_total': 'on_demand_dns.total',
# In-pod proxy management metrics (unstable)
'istio_active_proxy_count_total': 'active_proxy_count.total',
'istio_pending_proxy_count_total': 'pending_proxy_count.total',
'istio_proxies_started_total': 'proxies_started.total',
'istio_proxies_stopped_total': 'proxies_stopped.total',
# In-pod proxy management metrics (unstable). Ztunnel exposes these under the
# workload_manager_* family, not istio_*.
'workload_manager_active_proxy_count': 'active_proxy_count',
'workload_manager_pending_proxy_count': 'pending_proxy_count',
'workload_manager_proxies_started_total': 'proxies_started.total',
'workload_manager_proxies_stopped_total': 'proxies_stopped.total',
# XDS metrics (unstable)
'istio_xds_connection_terminations_total': 'xds.connection_terminations.total',
# Connection metrics (unstable)
'istio_connection_opens_total': 'connection.opens.total',
'istio_connection_closes_total': 'connection.closes.total',
'istio_connection_termination_total': 'connection.termination.total',
'istio_xds_message_total': 'xds.message.total',
'istio_xds_message_bytes_total': 'xds.message_bytes.total',
}


Expand Down
22 changes: 21 additions & 1 deletion istio/hatch.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,34 @@ dependencies = [
"requests-mock==1.4.0",
]

# Istio supports two data plane modes: traditional sidecar injection, and ambient
# (sidecar-less). Ambient mode graduated to GA in Istio 1.24. The matrix below is
# split into version × mode blocks so each block declares only the (version, mode)
# combinations that are actually supported by the version and have a working setup
# function in conftest.py. To add a new Istio version, extend the relevant block's
# `version` list; to add a new mode on an existing version, add the entry to the
# right block (and extend conftest.setup_istio* accordingly).

# Sidecar-mode envs. 1.13 stays for the legacy Go-runtime safety net; 1.24 is where
# ambient GA'd; 1.29 is the current supported release.
[[envs.default.matrix]]
python = ["3.13"]
version = ["1.13", "1.24", "1.29"]
mode = ["sidecar"]

# Ambient-mode envs. Requires Istio >= 1.24.
[[envs.default.matrix]]
python = ["3.13"]
version = ["1.13"]
version = ["1.24", "1.29"]
mode = ["ambient"]

[envs.default.overrides]
matrix.version.env-vars = [
{ key = "ISTIO_VERSION", value = "1.13.3", if = ["1.13"] },
{ key = "ISTIO_VERSION", value = "1.24.3", if = ["1.24"] },
{ key = "ISTIO_VERSION", value = "1.29.2", if = ["1.29"] },
]
matrix.mode.env-vars = "ISTIO_MODE"

[envs.default.env-vars]
DDEV_SKIP_GENERIC_TAGS_CHECK = "true"
31 changes: 15 additions & 16 deletions istio/metadata.csv
Original file line number Diff line number Diff line change
Expand Up @@ -447,24 +447,23 @@ istio.galley.istio.networking.virtualservices,gauge,,,,,0,istio,,
istio.galley.istio.networking.destinationrules,gauge,,,,,0,istio,,
istio.galley.istio.networking.gateways,gauge,,,,,0,istio,,
istio.galley.istio.authentication.meshpolicies,gauge,,,,,0,istio,,
istio.ztunnel.tcp.connections_opened.total,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total TCP connections opened through ztunnel",0,istio,ztunnel connections opened,
istio.ztunnel.tcp.connections_closed.total,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total TCP connections closed through ztunnel",0,istio,ztunnel connections closed,
istio.ztunnel.tcp.send_bytes.total,count,,byte,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total bytes sent through ztunnel TCP connections",0,istio,ztunnel bytes sent,
istio.ztunnel.tcp.received_bytes.total,count,,byte,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total bytes received through ztunnel TCP connections",0,istio,ztunnel bytes received,
istio.ztunnel.dns.requests.total,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total DNS requests handled by ztunnel",0,istio,ztunnel dns requests,
istio.ztunnel.dns.upstream_requests.total,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total DNS requests forwarded to upstream by ztunnel",0,istio,ztunnel dns upstream requests,
istio.ztunnel.dns.upstream_failures.total,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total DNS upstream request failures in ztunnel",0,istio,ztunnel dns failures,
istio.ztunnel.tcp.connections_opened.count,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total TCP connections opened through ztunnel",0,istio,ztunnel connections opened,
istio.ztunnel.tcp.connections_closed.count,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total TCP connections closed through ztunnel",0,istio,ztunnel connections closed,
istio.ztunnel.tcp.send_bytes.count,count,,byte,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total bytes sent through ztunnel TCP connections",0,istio,ztunnel bytes sent,
istio.ztunnel.tcp.received_bytes.count,count,,byte,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total bytes received through ztunnel TCP connections",0,istio,ztunnel bytes received,
istio.ztunnel.dns.requests.count,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total DNS requests handled by ztunnel",0,istio,ztunnel dns requests,
istio.ztunnel.dns.upstream_requests.count,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total DNS requests forwarded to upstream by ztunnel",0,istio,ztunnel dns upstream requests,
istio.ztunnel.dns.upstream_failures.count,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total DNS upstream request failures in ztunnel",0,istio,ztunnel dns failures,
istio.ztunnel.dns.upstream_request_duration_seconds.count,count,,second,,"[OpenMetrics V1 and V2 and Istio v1.24+] Count of DNS upstream request durations in ztunnel",0,istio,ztunnel dns duration count,
istio.ztunnel.dns.upstream_request_duration_seconds.sum,count,,second,,"[OpenMetrics V1 and V2 and Istio v1.24+] Sum of DNS upstream request durations in ztunnel",0,istio,ztunnel dns duration sum,
istio.ztunnel.on_demand_dns.total,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total on-demand DNS requests in ztunnel",0,istio,ztunnel on-demand dns,
istio.ztunnel.active_proxy_count.total,gauge,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Number of active in-pod proxies managed by ztunnel",0,istio,ztunnel active proxies,
istio.ztunnel.pending_proxy_count.total,gauge,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Number of pending in-pod proxies in ztunnel",0,istio,ztunnel pending proxies,
istio.ztunnel.proxies_started.total,count,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total number of in-pod proxies started by ztunnel",0,istio,ztunnel proxies started,
istio.ztunnel.proxies_stopped.total,count,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total number of in-pod proxies stopped by ztunnel",0,istio,ztunnel proxies stopped,
istio.ztunnel.xds.connection_terminations.total,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total XDS connection terminations in ztunnel",0,istio,ztunnel xds terminations,
istio.ztunnel.connection.opens.total,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total connections opened in ztunnel",0,istio,ztunnel connection opens,
istio.ztunnel.connection.closes.total,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total connections closed in ztunnel",0,istio,ztunnel connection closes,
istio.ztunnel.connection.termination.total,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total connection terminations in ztunnel",0,istio,ztunnel connection terminations,
istio.ztunnel.on_demand_dns.count,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total on-demand DNS requests in ztunnel",0,istio,ztunnel on-demand dns,
istio.ztunnel.active_proxy_count,gauge,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Number of active in-pod proxies managed by ztunnel",0,istio,ztunnel active proxies,
istio.ztunnel.pending_proxy_count,gauge,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Number of pending in-pod proxies in ztunnel",0,istio,ztunnel pending proxies,
istio.ztunnel.proxies_started.count,count,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total number of in-pod proxies started by ztunnel",0,istio,ztunnel proxies started,
istio.ztunnel.proxies_stopped.count,count,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total number of in-pod proxies stopped by ztunnel",0,istio,ztunnel proxies stopped,
istio.ztunnel.xds.connection_terminations.count,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total XDS connection terminations in ztunnel",0,istio,ztunnel xds terminations,
istio.ztunnel.xds.message.count,count,,message,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total XDS messages exchanged between ztunnel and istiod",0,istio,ztunnel xds messages,
istio.ztunnel.xds.message_bytes.count,count,,byte,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total bytes of XDS messages exchanged between ztunnel and istiod",0,istio,ztunnel xds message bytes,
istio.waypoint.request.count,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total HTTP requests through waypoint proxy",0,istio,waypoint requests,
istio.waypoint.request.duration.milliseconds.count,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Count of HTTP request durations through waypoint proxy",0,istio,waypoint request duration count,
istio.waypoint.request.duration.milliseconds.sum,count,,millisecond,,"[OpenMetrics V1 and V2 and Istio v1.24+] Sum of HTTP request durations through waypoint proxy",0,istio,waypoint request duration sum,
Expand Down
21 changes: 14 additions & 7 deletions istio/tests/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,19 +409,26 @@
'istio.galley.istio.authentication.meshpolicies',
]

# Ambient mode (ztunnel) - default namespace istio.ztunnel (OpenMetrics submits counters as .count)
V2_ZTUNNEL_METRICS = [
# Ambient mode (ztunnel) - default namespace istio.ztunnel.
# Ztunnel counters use `# TYPE foo counter` + `foo_total{} N`, which the legacy parser drops; require the v2 parser.
V2_ZTUNNEL_COUNTER_METRICS = [
'istio.ztunnel.tcp.connections_opened.count',
'istio.ztunnel.tcp.connections_closed.count',
'istio.ztunnel.tcp.send_bytes.count',
'istio.ztunnel.tcp.received_bytes.count',
'istio.ztunnel.dns.requests.count',
'istio.ztunnel.dns.upstream_requests.count',
'istio.ztunnel.dns.upstream_failures.count',
'istio.ztunnel.connection.opens.count',
'istio.ztunnel.connection.closes.count',
'istio.ztunnel.xds.message.count',
'istio.ztunnel.xds.message_bytes.count',
'istio.ztunnel.proxies_started.count',
]

# Gauges, unaffected by the legacy-parser counter bug; split out so the regression test pins only counters.
V2_ZTUNNEL_GAUGE_METRICS = [
'istio.ztunnel.active_proxy_count',
'istio.ztunnel.pending_proxy_count',
]

V2_ZTUNNEL_METRICS = V2_ZTUNNEL_COUNTER_METRICS + V2_ZTUNNEL_GAUGE_METRICS

# Ambient mode (waypoint) - default namespace istio.waypoint
V2_WAYPOINT_METRICS = [
'istio.waypoint.request.count',
Expand Down
Loading
Loading