Skip to content

Latest commit

 

History

History
149 lines (111 loc) · 10.3 KB

File metadata and controls

149 lines (111 loc) · 10.3 KB

Changelog

All notable changes to the EKS Node Monitoring Agent will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

v1.6.5 - 2026-05-14

What's Changed

Features

  • Add fabric manager detection for NVIDIA GPUs (d6a3279)
  • Add well-known XID codes from ECS agent and document resolution buckets (47ac6d4)

Bug Fixes

  • Allowlist kube-proxy IPVS-mode reject chains (b266023)
  • Fix race condition in NodeDiagnostic CRD controller (84e2371)

Dependencies

  • Bump github.com/moby/spdystream from 0.5.0 to 0.5.1 (81d2a72)

CI & Build

  • Bump Dockerfile Go version to 1.26.2 to fix CI (fc51d2e)
  • Scope down bot permissions and disable caching in GitHub Actions (42ee911)
  • Refine NVIDIA e2e test execution (ce19306)
  • Bump wait time for instance termination in e2e tests (682a60f)
  • Add CODEOWNERS file (f9a071c)

v1.6.4 - 2026-04-10

What's Changed

Features

  • Allow users to allowlist custom iptables rules (601a439)
  • Update well known XID codes (4e89770)

Bug Fixes

  • Bubble up S3 Upload errors to ND failure message (1eeb36f)
  • Enforce VPC CNI pod name as a prefix (165bf5e)
  • Ensure VPC CNI pod by init container (993719d)

Dependencies

  • Bump up the go version to 1.26.2 (d0cb78a)
  • Update Go dependencies (4cd1272)

CI & Build

  • Add CI to auto-update GPU list (e85832c)
  • Add minimal permissions block to restrict GITHUB_TOKEN to read-only access (e70f80e)
  • CI runs only in parent repo (11bacd6)

v1.6.3 - 2026-04-03

What's Changed

Features

  • Add tcpdump packet capture support (aea0cec)
  • Reduce noisy logs on clusters with alternative CNIs (a3d468b)
  • Upgrade containerd from 1.7.8 to 2.2.1 (d04fc2c)
  • Make probe and affinities configurable (c54c7c8)

Bug Fixes

  • Tolerate IPAMD pod teardown (45f85de)
  • Short circuit in IPAMD proc lookup (48e563c)
  • Tolerate IPAMD startup up to ipamd monitor interval (93c547f)
  • Fix inconsistency between probe ports args in helm charts and addon configuration (26869aa)

Dependencies

CI & Build

  • Add CI to update Go deps (9a4d17e)
  • Automatically bump dcgm-exporter image version (bed7e05)
  • Add kubetest2 sweeper to handle clean up of stale leaked resources (025c96c)
  • Optimize CI actions for e2e testing (cdb2482)
  • Run unit test on PR creation (cbbf06e)

v1.6.2 - 2026-03-23

What's Changed

Features

  • Add kubectl ekslogs plugin for NodeDiagnostic log collection (a2a9660)
  • Add ZRAM usage monitoring to kernel monitor (b7d3ed3)
  • Change NvidiaDeviceCountMismatch severity from Warning to Fatal (8379e15)
  • Add g7e instances to NVIDIA DCGM affinity list (c46738a)

Bug Fixes

  • Add -o short-iso-precise to all journalctl invocations for consistent ISO 8601 timestamps with timezone offset (442308a)

Dependencies

  • Bump google.golang.org/grpc from 1.79.2 to 1.79.3 (1fe8681)
  • Update Go dependencies (929dce6)

v1.6.1 - 2026-03-17

What's Changed

Features

  • Add resizePolicy to chart for in-place pod vertical scaling (61eb3fb)
  • Collect automode component logs in dedicated folder (65aa2c7)

Bug Fixes

  • Remove helm.sh/chart from DaemonSet selector labels to fix immutable selector upgrade failures from v1.5.x (a7ab4ee)
  • Allowlist Calico iptables chains in UnexpectedRejectRule check to prevent false-positive warnings (14d813e)

Documentation

  • Add example for overriding ports in configuration (ae33d75)

v1.6.0 - 2026-03-09

What's Changed

Features

  • Add per-monitor configuration to selectively disable monitors (019a715)
  • Add "node" destination for NodeDiagnostic log collection (e4d85ac)
  • Add global.podLabels to Helm chart (8479714)
  • Update NodeDiagnostic CRD for node destination (05a4038)

Bug Fixes

  • Fix NodeDiagnosticController using wrong kubeclient (611fa46)
  • Stabilize node condition transition time for multiple errors (3f28f13)
  • Ignore DCGM health code 122 (IMEX unhealthy) in soak tests (ebfcaa5)
  • Fix e2e agent manifest to only replace agent image, preserving DCGM image (c3fa12e)
  • Make nvidia monitor e2e tests more resilient (b2521ca)
  • Add containerRegistry override to chart for addon platform compatibility

CI & Build

  • Merge e2e-ci into e2e test suite (cbb92a8)
  • Add Makefile support for GOBIN env var for CI/CD build systems (4a6b5dd)
  • Include e2e test binary and charts in release target (5cb1fbc, c0c9c30)
  • Install helm via go install for build environments without helm (fb25103)
  • Pass instance-type override to kubetest2 in CI (29d170f)
  • Move accelerated hardware monitors to separate parallel e2e block (eadfaa8)
  • Reduce CI flakiness and optimize resources (631078f)

What's Changed

  • Update base DCGM image to 4.5.2-4.8.1-ubuntu22.04 to resolve CVEs (1a2cda4)