All notable changes to the EKS Node Monitoring Agent will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
v1.6.5 - 2026-05-14
- Add fabric manager detection for NVIDIA GPUs (d6a3279)
- Add well-known XID codes from ECS agent and document resolution buckets (47ac6d4)
- Allowlist kube-proxy IPVS-mode reject chains (b266023)
- Fix race condition in NodeDiagnostic CRD controller (84e2371)
- Bump
github.com/moby/spdystreamfrom 0.5.0 to 0.5.1 (81d2a72)
- Bump Dockerfile Go version to 1.26.2 to fix CI (fc51d2e)
- Scope down bot permissions and disable caching in GitHub Actions (42ee911)
- Refine NVIDIA e2e test execution (ce19306)
- Bump wait time for instance termination in e2e tests (682a60f)
- Add CODEOWNERS file (f9a071c)
v1.6.4 - 2026-04-10
- Bubble up S3 Upload errors to ND failure message (1eeb36f)
- Enforce VPC CNI pod name as a prefix (165bf5e)
- Ensure VPC CNI pod by init container (993719d)
- Add CI to auto-update GPU list (e85832c)
- Add minimal permissions block to restrict GITHUB_TOKEN to read-only access (e70f80e)
- CI runs only in parent repo (11bacd6)
v1.6.3 - 2026-04-03
- Add tcpdump packet capture support (aea0cec)
- Reduce noisy logs on clusters with alternative CNIs (a3d468b)
- Upgrade containerd from 1.7.8 to 2.2.1 (d04fc2c)
- Make probe and affinities configurable (c54c7c8)
- Tolerate IPAMD pod teardown (45f85de)
- Short circuit in IPAMD proc lookup (48e563c)
- Tolerate IPAMD startup up to ipamd monitor interval (93c547f)
- Fix inconsistency between probe ports args in helm charts and addon configuration (26869aa)
- Add CI to update Go deps (9a4d17e)
- Automatically bump dcgm-exporter image version (bed7e05)
- Add kubetest2 sweeper to handle clean up of stale leaked resources (025c96c)
- Optimize CI actions for e2e testing (cdb2482)
- Run unit test on PR creation (cbbf06e)
v1.6.2 - 2026-03-23
- Add
kubectl ekslogsplugin for NodeDiagnostic log collection (a2a9660) - Add ZRAM usage monitoring to kernel monitor (b7d3ed3)
- Change
NvidiaDeviceCountMismatchseverity from Warning to Fatal (8379e15) - Add g7e instances to NVIDIA DCGM affinity list (c46738a)
- Add
-o short-iso-preciseto all journalctl invocations for consistent ISO 8601 timestamps with timezone offset (442308a)
v1.6.1 - 2026-03-17
- Add
resizePolicyto chart for in-place pod vertical scaling (61eb3fb) - Collect automode component logs in dedicated folder (65aa2c7)
- Remove
helm.sh/chartfrom DaemonSet selector labels to fix immutable selector upgrade failures from v1.5.x (a7ab4ee) - Allowlist Calico iptables chains in UnexpectedRejectRule check to prevent false-positive warnings (14d813e)
- Add example for overriding ports in configuration (ae33d75)
v1.6.0 - 2026-03-09
- Add per-monitor configuration to selectively disable monitors (019a715)
- Add "node" destination for NodeDiagnostic log collection (e4d85ac)
- Add
global.podLabelsto Helm chart (8479714) - Update NodeDiagnostic CRD for node destination (05a4038)
- Fix NodeDiagnosticController using wrong kubeclient (611fa46)
- Stabilize node condition transition time for multiple errors (3f28f13)
- Ignore DCGM health code 122 (IMEX unhealthy) in soak tests (ebfcaa5)
- Fix e2e agent manifest to only replace agent image, preserving DCGM image (c3fa12e)
- Make nvidia monitor e2e tests more resilient (b2521ca)
- Add
containerRegistryoverride to chart for addon platform compatibility
- Merge e2e-ci into e2e test suite (cbb92a8)
- Add Makefile support for GOBIN env var for CI/CD build systems (4a6b5dd)
- Include e2e test binary and charts in release target (5cb1fbc, c0c9c30)
- Install helm via
go installfor build environments without helm (fb25103) - Pass instance-type override to kubetest2 in CI (29d170f)
- Move accelerated hardware monitors to separate parallel e2e block (eadfaa8)
- Reduce CI flakiness and optimize resources (631078f)
- Update base DCGM image to 4.5.2-4.8.1-ubuntu22.04 to resolve CVEs (1a2cda4)