Skip to content

Conversation

@ChristianGeie
Copy link
Collaborator

@ChristianGeie ChristianGeie commented Aug 15, 2025

This PR introduces a comprehensive health check endpoint (/healthz) to significantly improve the robustness and reliability of the sidecar when running in a Kubernetes environment. It allows for proper readiness and liveness checks, enabling better lifecycle management and self-healing capabilities.

Key Features

  • New /healthz Endpoint: A new HTTP endpoint is available on port 8080 (configurable via the HEALTH_PORT environment variable)

  • Readiness Probe:

    • The sidecar now reports as "ready" (HTTP 200) only after the initial synchronization of all configured resources is complete
    • This prevents the main application container from starting or receiving traffic prematurely, ensuring all configuration files are present at startup
  • Liveness Probe:

    • The probe continuously monitors the sidecar's health by checking two critical conditions:

      • Kubernetes API Contact: Verifies that the sidecar has had successful contact with the Kubernetes API within the last 60 seconds
      • Watcher Process Health: Ensures that all internal watcher subprocesses are running correctly
    • If any check fails, the probe fails, signaling Kubernetes to restart the container

Enhancements

  • Reduced Log Noise: Access logs for frequent /healthz requests are automatically filtered out to keep application logs clean and focused
  • Fail-Fast on Process Death: The main process now exits immediately if a critical watcher subprocess dies, ensuring a prompt restart by Kubernetes

Testing

The CI pipeline has been enhanced with new tests to validate this functionality:

  • A test to confirm the Uvicorn health server starts successfully
  • A liveness test that simulates a watcher process failure and asserts that Kubernetes restarts the pod as expected
  • A Kubernetes Config load test for Sleep and Watch based sidecar

This feature makes the sidecar more production-ready and easier to operate reliably within Kubernetes.

Christian Geie and others added 30 commits April 28, 2025 12:38
add mark_ready() after initial write to container filesystem
* Sending URL request need a log Info event:
  Requiring debug log level to see url request is either unhelpful or too noisy.
  Also mirrors script running log event.
* When logging of url request "None" was reported if defaulting to GET method.
* Useful to see payload in event og POST.
* Removal of ellipsis in config load.
* A missing, non-mandatory, option (folder annotation) shouldn't result
  in a warning.
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [kubernetes](https://github.com/kubernetes-client/python) from 32.0.1 to 33.1.0.
- [Release notes](https://github.com/kubernetes-client/python/releases)
- [Changelog](https://github.com/kubernetes-client/python/blob/v33.1.0/CHANGELOG.md)
- [Commits](kubernetes-client/python@v32.0.1...v33.1.0)

---
updated-dependencies:
- dependency-name: kubernetes
  dependency-version: 33.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [logfmter](https://github.com/josheppinette/python-logfmter) from 0.0.9 to 0.0.10.
- [Release notes](https://github.com/josheppinette/python-logfmter/releases)
- [Changelog](https://github.com/josheppinette/python-logfmter/blob/main/HISTORY.md)
- [Commits](josheppinette/python-logfmter@v0.0.9...v0.0.10)

---
updated-dependencies:
- dependency-name: logfmter
  dependency-version: 0.0.10
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4 to 5.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@v4...v5)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
especially for SSLCertVerificationError
to avoid returning "none" which would lead to an AttributeError in the caller code.
add exception handling for SSLError and RetryError
add debug log message for dummy response
in _get_file_data_and_name() to avoid AttributeError
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v5)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
…zation and add a check for 5xx error handling
…zation with more detailed error messages and add a check for empty 500.txt files
Christian Geie added 3 commits October 28, 2025 10:42
The Kubernetes client configuration was not being reliably initialized in the forked processes used by the WATCH and SLEEP methods.
This was due to a call to a non-existent function `_ensure_kube_config_in_child` which was left over from a previous refactoring.
This change replaces the broken call with a call to the correct `_initialize_kubeclient_configuration` function at the start
of the `list_resources` and `_watch_resource_iterator` functions.
This ensures that a valid Kubernetes client configuration is loaded in every process, resolving potential connectivity issues
with the API server.
The unused and broken helper function `_init_kube_in_child_if_needed` has been removed.
This commit enhances the integration test suite to verify that the Kubernetes client is correctly configured in both WATCH and SLEEP modes.

- A new 'sidecar-sleep' pod is added to the test resources, configured to run in SLEEP mode.
- The 'build_and_test' workflow is updated to include this new pod in the test run.
- A new verification step, "Verify K8s Config Loading", is added. This step dynamically retrieves the ClusterIP of the Kubernetes API service and checks the logs of both the WATCH mode pod ('sidecar') and the SLEEP mode pod ('sidecar-sleep') to ensure they connect to the correct API server address.
@ChristianGeie ChristianGeie added enhancement New feature or request test new or better tests github_actions Pull requests that update Github_actions code python Pull requests that update Python code labels Oct 28, 2025
@ChristianGeie ChristianGeie merged commit 397a806 into master Oct 28, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request github_actions Pull requests that update Github_actions code python Pull requests that update Python code test new or better tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is 1.31.0 stable load? 1.30.10+ Breaks kubernetes api connection

3 participants