Skip to content

Collect shoot cluster events via k8sobjects receiver#77

Open
nickytd wants to merge 3 commits intogardener:mainfrom
nickytd:with-k8s-objects-receiver
Open

Collect shoot cluster events via k8sobjects receiver#77
nickytd wants to merge 3 commits intogardener:mainfrom
nickytd:with-k8s-objects-receiver

Conversation

@nickytd
Copy link
Copy Markdown
Contributor

@nickytd nickytd commented Apr 1, 2026

What this PR does / why we need it:

Shoot cluster events (events.k8s.io/v1) are currently collected by the proprietary event-logger component, which tails container logs and emits events in a non-standard, Gardener-specific log format. This format is opaque to upstream observability tooling and makes it impossible to forward shoot events to external OTel-native backends without custom parsing.

This PR replaces that approach by adding a k8sobjects receiver to the OTel Collector that watches events.k8s.io/v1 events directly from the shoot API server in real time (watch mode). Events are emitted as structured OTel log records, enriched with standard resource attributes, and forwarded through the existing exporter pipeline — fully compatible with any upstream OTel-native receiver without any custom parsing or format translation.

As a consequence, the Fluent Bit ClusterInput is updated to explicitly exclude event-logger container logs from collection, since events are now sourced directly via the k8sobjects receiver.

Changes:

  • Shoot access secret: A ShootAccessSecret (shoot-access-otelcol) is reconciled in the shoot namespace. Gardener's token projector keeps this secret's kubeconfig up to date with a short-lived token, giving the Collector a secure, auto-rotating credential to the shoot API server.

  • Shoot RBAC (ManagedResource: <namespace>-shoot): A ClusterRole + ClusterRoleBinding is deployed into the shoot cluster granting get/list/watch on events.k8s.io/events to the Collector's service account. Only the minimum required permissions are requested.

  • Kubeconfig volume: The shoot generic kubeconfig (projected secret + access token) is mounted into the Collector pod at gardenerutils.VolumeMountPathGenericKubeconfig, and KUBECONFIG is set to gardenerutils.PathGenericKubeconfig so the k8sobjects receiver picks it up automatically via auth_type: kubeConfig.

  • k8sobjects/events receiver: Configured with auth_type: kubeConfig, watching events.k8s.io/events in watch mode for low-latency delivery.

  • transform/events processor: Strips managedFields from the event body before forwarding, reducing payload size and noise.

  • logs/events pipeline: k8sobjects/events → resource → memoryLimiter → transform/events → batch → exporters. The resource processor enriches events with k8s.cluster.name, gardener.project.name, and gardener.shoot.name — the same attributes already applied to all other signals.

  • Lifecycle correctness:

    • Delete: shoot ManagedResource is deleted and waited on before the shoot access secret is removed, ensuring the ManagedResource controller can still authenticate to the shoot while cleaning up the RBAC objects.
    • Migrate: calls SetKeepObjects(true) on the shoot ManagedResource before delegating to Delete, so the shoot RBAC objects are preserved on the shoot cluster during control-plane migration and the target seed can take ownership without re-creating them.
  • Fluent Bit (separate commit): excludes event-logger container logs from the ClusterInput (pattern *_shoot--*_event-logger_*.log) since events are now collected via the k8sobjects receiver. Also bumps the plugin image to v1.4.0.

  • Example (separate commit): adds a routing connector to examples/opentelemetry-receiver.yaml that splits incoming logs by body type — structured k8s object logs vs plain string logs — with dedicated transform processors setting _msg.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

  • Only events.k8s.io/v1 events are collected — core v1.Event (legacy API group "") is intentionally excluded as modern Kubernetes components write exclusively to events.k8s.io.
  • The k8sobjects receiver uses auth_type: kubeConfig which reads $KUBECONFIG. This is the standard Gardener pattern for shoot-cluster access from seed workloads.
  • The shoot ManagedResource uses keepObjects: false during normal operation; SetKeepObjects(true) is set only transiently during Migrate before the resource is deleted from the old seed.
  • To verify the receiver is active after reconciliation, port-forward the Collector's metrics endpoint and check the receiver metrics:
    ```bash
    kubectl -n shoot--local--local port-forward statefulset/otelcol 8888:8888
    curl -s localhost:8888/metrics | grep -E 'receiver="k8sobjects/events"'
    ```
    A non-zero `otelcol_receiver_accepted_log_records_total{receiver="k8sobjects/events"}` confirms events are being received.

Release note:
```feature operator
Replace proprietary event-logger-based shoot event collection with the OTel
k8sobjects receiver. Events are now collected directly from the shoot API server
as structured OTel log records (events.k8s.io/v1), enriched with cluster,
project, and shoot name attributes, and forwarded in standard OTel format to
any configured upstream receiver without custom parsing.
```

nickytd added 3 commits April 1, 2026 20:57
- Reconcile a shoot access secret (NewShootAccessSecret) so the OTel
  Collector can authenticate to the shoot API server.
- Deploy a ClusterRole + ClusterRoleBinding into the shoot cluster via a
  dedicated ManagedResource (shootManagedResourceName) granting get/list/watch
  on events.k8s.io/events.
- Mount the generic shoot kubeconfig volume on the Collector pod and set
  KUBECONFIG env var so the k8sobjects receiver can use it.
- Add k8sobjects/events receiver (auth_type: kubeConfig, watch mode) and
  a transform/events processor that strips managedFields from the event body.
- Wire a logs/events pipeline: k8sobjects/events → resource, memoryLimiter,
  transform/events, batch → exporters.
- Delete/wait for shoot ManagedResource before removing shoot access secret
  in Delete to avoid orphaned RBAC in the shoot cluster.
- Fix Migrate: call SetKeepObjects on the shoot ManagedResource before
  delegating to Delete, so shoot RBAC objects are preserved during
  control-plane migration and the target seed can reconcile them cleanly.
- Tighten excludePath to '*_shoot--*_event-logger_*.log' to avoid
  accidentally excluding unrelated containers in non-shoot namespaces.
- Bump fluent-bit-plugin image from v1.3.0 to v1.4.0.
Add a routing connector that splits incoming logs by body type:
- IsMap(body) → logs/objects pipeline (k8s object events)
- default       → logs/string pipeline (plain string logs)

Each pipeline runs a transform processor to set the _msg attribute
before batching. Remove redundant processors: [] from logs/fanin.
@nickytd nickytd requested a review from a team as a code owner April 1, 2026 19:14
@gardener-prow
Copy link
Copy Markdown

gardener-prow bot commented Apr 1, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign dnaeon for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 1, 2026
@nickytd nickytd changed the title feat(actuator): collect shoot cluster events via k8sobjects receiver Collect shoot cluster events via k8sobjects receiver Apr 1, 2026
@Bobi-Wan
Copy link
Copy Markdown

Bobi-Wan commented Apr 2, 2026

Haven't been able to go through all the changes yet, but I have an initial question:
What is the need for a shoot access secret instead of an automounted service account token?

Ignore. I was thinking of the data plane.

@nickytd
Copy link
Copy Markdown
Contributor Author

nickytd commented Apr 2, 2026

/kind enhancement

@gardener-prow gardener-prow bot added kind/enhancement Enhancement, improvement, extension and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement Enhancement, improvement, extension size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants