Skip to content

Conversation

@Fedosin
Copy link
Contributor

@Fedosin Fedosin commented Oct 16, 2025

Fixes #16162

Proposed Changes

The queue-proxy admin server now always serves HTTP on port 8022, even when system-internal-tls is enabled. This simplifies the PreStop hook configuration and fixes graceful shutdown issues.

Why this approach:

  • PreStop hooks are called by kubelet locally within the pod (localhost)
  • No network traffic leaves the pod, so TLS isn't needed for security
  • Simpler implementation with no dynamic scheme configuration
  • Prevents TLS handshake errors during pod shutdown

This fixes the issue where pods would receive HTTP 502 errors during scale-down operations when system-internal-tls was enabled. The error occurred because the PreStop hook was trying to connect with HTTP to a TLS-enabled admin server, causing immediate SIGTERM and dropped requests.

@knative-prow knative-prow bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 16, 2025
@dprotaso
Copy link
Member

To simplify this I think we should just have the preStop hook on a non-TLS server. I'd expect it to only be hit by the kubelet.

@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.08%. Comparing base (0ee5fc0) to head (b2d3c47).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pkg/queue/sharedmain/main.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16163      +/-   ##
==========================================
- Coverage   80.09%   80.08%   -0.01%     
==========================================
  Files         214      214              
  Lines       16943    16940       -3     
==========================================
- Hits        13570    13566       -4     
- Misses       3013     3014       +1     
  Partials      360      360              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@knative-prow knative-prow bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 16, 2025
@Fedosin Fedosin changed the title Fix PreStop hook to use HTTPS with system-internal-tls Keep queue-proxy admin server on HTTP for PreStop hooks Oct 16, 2025
@dprotaso
Copy link
Member

@Fedosin do we have a test that can catch this regression?

@knative-prow knative-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 16, 2025
@Fedosin
Copy link
Contributor Author

Fedosin commented Oct 16, 2025

@dprotaso yes, sure. I created a new function setupServers in servers.go that is responsible for server creation. After that I added tests to prevent regressions there.

The patch became bigger though...

@dprotaso
Copy link
Member

The patch became bigger though...

Yeah I was thinking less about a unit test and wondering if there's an easy way to see if preStop hook was invoked successfully. For example if there's a Kubernetes event that's emitted when it fails etc.

Let me know what you think

@Fedosin Fedosin force-pushed the https-prehook branch 2 times, most recently from ac4b4d8 to a0d06bf Compare October 17, 2025 10:21
The queue-proxy admin server now always serves HTTP on port 8022, even
when system-internal-tls is enabled. This simplifies the PreStop hook
configuration and fixes graceful shutdown issues.

Changes:
- Queue-proxy admin server always uses HTTP, only main server uses TLS
- PreStop hooks always use HTTP scheme (removed dynamic configuration)
- Updated tests to reflect that admin server is always HTTP

Why this approach:
- PreStop hooks are called by kubelet locally within the pod (localhost)
- No network traffic leaves the pod, so TLS isn't needed for security
- Simpler implementation with no dynamic scheme configuration
- Prevents TLS handshake errors during pod shutdown

This fixes the issue where pods would receive HTTP 502 errors during
scale-down operations when system-internal-tls was enabled. The error
occurred because the PreStop hook was trying to connect with HTTP to
a TLS-enabled admin server, causing immediate SIGTERM and dropped
requests.
@maschmid
Copy link
Contributor

/test ?

@knative-prow
Copy link

knative-prow bot commented Oct 17, 2025

@maschmid: The following commands are available to trigger required jobs:

/test build-tests
/test contour-latest
/test contour-tls
/test gateway-api-latest
/test istio-latest-no-mesh
/test istio-latest-no-mesh-tls
/test kourier-stable
/test kourier-stable-tls
/test unit-tests
/test upgrade-tests

The following commands are available to trigger optional jobs:

/test gateway-api-latest-and-contour
/test https
/test istio-latest-mesh
/test istio-latest-mesh-short
/test istio-latest-mesh-tls
/test performance-tests

Use /test all to run the following jobs that were automatically triggered:

build-tests_serving_main
istio-latest-no-mesh-tls_serving_main
istio-latest-no-mesh_serving_main
unit-tests_serving_main
upgrade-tests_serving_main
Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@maschmid
Copy link
Contributor

/test kourier-stable

@maschmid
Copy link
Contributor

/test contour-latest

@maschmid
Copy link
Contributor

/test kourier-stable

@Fedosin
Copy link
Contributor Author

Fedosin commented Oct 17, 2025

PASS test/e2e/systeminternaltls.TestGracefulShutdownWithTLS (21.20s) but test/e2e: TestDestroyPodInflight is flaky :(

@Fedosin
Copy link
Contributor Author

Fedosin commented Oct 17, 2025

/test kourier-stable

@dprotaso
Copy link
Member

/lgtm
/approve

thanks for the e2e test looks great

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2025
@knative-prow
Copy link

knative-prow bot commented Oct 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, Fedosin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 18, 2025
@knative-prow knative-prow bot merged commit 54868c2 into knative:main Oct 18, 2025
93 checks passed
Fedosin added a commit to Fedosin/serving that referenced this pull request Oct 21, 2025
The queue-proxy admin server now always serves HTTP on port 8022, even
when system-internal-tls is enabled. This simplifies the PreStop hook
configuration and fixes graceful shutdown issues.

Changes:
- Queue-proxy admin server always uses HTTP, only main server uses TLS
- PreStop hooks always use HTTP scheme (removed dynamic configuration)
- Updated tests to reflect that admin server is always HTTP

Why this approach:
- PreStop hooks are called by kubelet locally within the pod (localhost)
- No network traffic leaves the pod, so TLS isn't needed for security
- Simpler implementation with no dynamic scheme configuration
- Prevents TLS handshake errors during pod shutdown

This fixes the issue where pods would receive HTTP 502 errors during
scale-down operations when system-internal-tls was enabled. The error
occurred because the PreStop hook was trying to connect with HTTP to
a TLS-enabled admin server, causing immediate SIGTERM and dropped
requests.
openshift-merge-bot bot pushed a commit to openshift-knative/serving that referenced this pull request Oct 21, 2025
… (#1614)

The queue-proxy admin server now always serves HTTP on port 8022, even
when system-internal-tls is enabled. This simplifies the PreStop hook
configuration and fixes graceful shutdown issues.

Changes:
- Queue-proxy admin server always uses HTTP, only main server uses TLS
- PreStop hooks always use HTTP scheme (removed dynamic configuration)
- Updated tests to reflect that admin server is always HTTP

Why this approach:
- PreStop hooks are called by kubelet locally within the pod (localhost)
- No network traffic leaves the pod, so TLS isn't needed for security
- Simpler implementation with no dynamic scheme configuration
- Prevents TLS handshake errors during pod shutdown

This fixes the issue where pods would receive HTTP 502 errors during
scale-down operations when system-internal-tls was enabled. The error
occurred because the PreStop hook was trying to connect with HTTP to
a TLS-enabled admin server, causing immediate SIGTERM and dropped
requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP 502 errors on scale downs, graceful shutdown broken with internal encryption

3 participants