Skip to content

[tmpnet] Enable monitoring of nodes running in kube #3794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 30, 2025

Conversation

maru-ava
Copy link
Contributor

@maru-ava maru-ava commented Mar 13, 2025

PR Chain: tmpnet+kube

This PR chain enables tmpnet to deploy temporary networks to Kubernetes. Early PRs refactor tmpnet to support the addition in #3615 of a new tmpnet node runtime for kube.

Why this should be merged

  • Ensures nodes deployed to kube have labels and annotations to enable the same labeling as nodes running a processes.
  • Enables deployment of prometheus and promtail to kube to ensure collection of logs and metrics from nodes deployed on the cluster.

How this works

  • adds labels and annotations to node pods
  • enables deployment of the collectors via tmpnetctl start-kind-cluster --start-metrics-collector --start-logs-collector
  • updates the kube e2e ci job to check that logs and metrics were collected
  • defines collector deployment in yaml for compatibility with argocd usage

How this was tested

CI configures a metrics check to verify that logs and metrics were collected

Need to be documented in RELEASES.md?

N/A

TODO

[DeferCleanup (Suite)] 
/home/runner/work/avalanchego/avalanchego/tests/fixture/e2e/ginkgo_test_context.go:96

  Timeline >>
   [05-28|19:02:40.088] INFO tmpnet/check_monitoring.go:79 checking if logs exist {"url": "https://loki-poc.avax-dev.network/", "query": "sum(count_over_time({network_uuid=\"27080d29-7152-4a5c-a883-af4b6a47700b\"}[1h]))"}
   [05-28|19:02:40.507] INFO tmpnet/check_monitoring.go:45 collected count is non-zero {"type": "logs", "count": 381224}
  << Timeline
------------------------------
[DeferCleanup (Suite)] PASSED [0.291 seconds]
[DeferCleanup (Suite)] 
/home/runner/work/avalanchego/avalanchego/tests/fixture/e2e/ginkgo_test_context.go:96

  Timeline >>
   [05-28|19:02:40.509] INFO tmpnet/check_monitoring.go:186 checking if metrics exist {"url": "https://prometheus-poc.avax-dev.network/", "query": "count({network_uuid=\"27080d29-7152-4a5c-a883-af4b6a47700b\"})"}
   [05-28|19:02:40.800] INFO tmpnet/check_monitoring.go:45 collected count is non-zero {"type": "metrics", "count": 7497}
  << Timeline
------------------------------
[DeferCleanup (Suite)] PASSED [0.000 seconds]

@maru-ava maru-ava added testing This primarily focuses on testing ci This focuses on changes to the CI process labels Mar 13, 2025
@maru-ava maru-ava self-assigned this Mar 13, 2025
@github-project-automation github-project-automation bot moved this to Backlog 🗄️ in avalanchego Mar 13, 2025
@maru-ava maru-ava moved this from Backlog 🗄️ to In Progress 🏗 in avalanchego Mar 13, 2025
@maru-ava maru-ava force-pushed the tmpnet-monitor-kube branch 3 times, most recently from ac5339c to b742fdd Compare March 13, 2025 05:05
@maru-ava maru-ava force-pushed the tmpnet-monitor-kube branch from b742fdd to fc2cc0c Compare March 13, 2025 22:40
@maru-ava maru-ava force-pushed the tmpnet-monitor-kube branch 2 times, most recently from d9a5f75 to ae72771 Compare March 24, 2025 02:26
@maru-ava maru-ava force-pushed the tmpnet-monitor-kube branch 3 times, most recently from 5875349 to 1e699cf Compare March 27, 2025 22:00
@maru-ava maru-ava force-pushed the tmpnet-monitor-kube branch from 1e699cf to 5ee084c Compare March 27, 2025 22:05
@maru-ava maru-ava force-pushed the tmpnet-monitor-kube branch 2 times, most recently from 18ce850 to 42e7450 Compare March 27, 2025 22:44
@maru-ava maru-ava force-pushed the tmpnet-kube branch 2 times, most recently from d87b8bf to 9c6ef9e Compare March 27, 2025 22:48
@maru-ava maru-ava force-pushed the tmpnet-monitor-kube branch from 42e7450 to 053bfc5 Compare March 27, 2025 22:48
@maru-ava maru-ava force-pushed the tmpnet-monitor-kube branch from 053bfc5 to 2f49e50 Compare March 27, 2025 23:01
@maru-ava maru-ava force-pushed the tmpnet-monitor-kube branch from 81303d2 to ae652fa Compare May 23, 2025 17:57
@maru-ava maru-ava moved this from Backlog 🧊 to In Progress 🏗️ in avalanchego May 23, 2025
@maru-ava maru-ava force-pushed the tmpnet-monitor-kube branch from ae652fa to 717cad1 Compare May 23, 2025 18:08
@maru-ava maru-ava force-pushed the tmpnet-monitor-kube branch from 717cad1 to 8932fa3 Compare May 28, 2025 18:20
Base automatically changed from tmpnet-kube to master May 28, 2025 18:41
@maru-ava maru-ava force-pushed the tmpnet-monitor-kube branch from 8932fa3 to f710299 Compare May 28, 2025 18:43
@maru-ava maru-ava moved this from In Progress 🏗️ to Ready 🚦 in avalanchego May 28, 2025
@maru-ava maru-ava marked this pull request as ready for review May 28, 2025 18:43
@Copilot Copilot AI review requested due to automatic review settings May 28, 2025 18:43
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Enables deployment and testing of Prometheus and Promtail collectors in a local Kind cluster via tmpnetctl start-kind-cluster, wiring new flags through CLI, tests, and Kubernetes manifests.

  • Introduce CollectorVars to register --start-metrics-collector and --start-logs-collector flags and corresponding test checks
  • Add YAML manifests for Promtail DaemonSet and Prometheus Agent StatefulSet
  • Integrate DeployKubeCollectors into StartKindCluster and propagate monitoring labels in NewNodeStatefulSet

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/upgrade/upgrade_test.go Swap individual bool flags for unified CollectorVars
tests/fixture/tmpnet/yaml/promtail-daemonset.yaml Add Promtail DaemonSet manifest
tests/fixture/tmpnet/yaml/prometheus-agent.yaml Add Prometheus Agent StatefulSet manifest
tests/fixture/tmpnet/tmpnetctl/main.go Wire CollectorVars into start-kind-cluster command
tests/fixture/tmpnet/start_kind_cluster.go Update StartKindCluster signature and invoke DeployKubeCollectors
tests/fixture/tmpnet/monitor_kube.go Implement DeployKubeCollectors and helper functions
tests/fixture/tmpnet/kube_runtime.go Pass monitoring labels to node runtime
tests/fixture/tmpnet/kube.go Extend NewNodeStatefulSet to apply labels and annotations
tests/fixture/tmpnet/flags/collector.go Define CollectorVars and flag registration
tests/fixture/tmpnet/README.md Document new flags and YAML directory
scripts/start_kind_cluster.sh Propagate new start flags in helper script

Co-authored-by: Copilot <[email protected]>
Signed-off-by: maru <[email protected]>
@maru-ava maru-ava changed the title [tmpnet] Enable monitoring of local kind cluster [tmpnet] Enable monitoring of nodes running in kube May 29, 2025
@github-project-automation github-project-automation bot moved this from Ready 🚦 to In Progress 🏗️ in avalanchego May 30, 2025
@maru-ava maru-ava added this pull request to the merge queue May 30, 2025
Merged via the queue into master with commit 4f6de9b May 30, 2025
26 checks passed
@maru-ava maru-ava deleted the tmpnet-monitor-kube branch May 30, 2025 18:20
@github-project-automation github-project-automation bot moved this from In Progress 🏗️ to Done 🎉 in avalanchego May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci This focuses on changes to the CI process testing This primarily focuses on testing
Projects
Status: Done 🎉
Development

Successfully merging this pull request may close these issues.

3 participants