Skip to content

feat: failover route on cold-start time out #1280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yyewolf
Copy link

@yyewolf yyewolf commented Apr 7, 2025

This PR is an attempt at solving #874 by adding a fallback service.

One use case would be the following :

  • You have a service that takes a bit of time to scale up from zero
  • You want the user to have feedback on what's going on instead of seeing the website loading

It's currently implemented so that if there's an error or time out occuring while fetching endpoints, it would then choose to forward to the fallback service.

To make sure the request is properly forwarded there, you can use a startupProbe on the Pod to make sure it does not fill the service's endpoints before it's actually in a Ready state.

This change includes no breaking change CRD wise because it doesn't affect the behavior of the operator / interceptor when the field is not present.

I'm looking for your help on this matter, whether you think this is a good idea or not and discuss about the changes.

I haven't done much regarding testing and documenting in case this proposition is not accepted.

An example of this PR being in use is available here : https://juice.hackcorp.net/.

It is funneled through an ExternalName service, that points to another service that points to a caddy with a small static page.

Checklist

Fixes #874

@yyewolf yyewolf requested a review from a team as a code owner April 7, 2025 21:32
@wozniakjan wozniakjan self-assigned this Apr 8, 2025
@vadasambar
Copy link

@yyewolf make test is failing

--- FAIL: TestForwarderSuccess (0.10s)
    upstream_test.go:49: 
                Error Trace:    /Users/suraj.bankar/work/http-add-on/interceptor/handler/upstream_test.go:49
                Error:          Should be true
                Test:           TestForwarderSuccess
                Messages:       request was not received within 100ms
--- FAIL: TestForwarderHeaderTimeout (0.00s)
    upstream_test.go:96: 
                Error Trace:    /Users/suraj.bankar/work/http-add-on/interceptor/handler/upstream_test.go:96
                Error:          Not equal: 
                                expected: 502
                                actual  : 500
                Test:           TestForwarderHeaderTimeout
--- FAIL: TestForwarderWaitsForSlowOrigin (0.01s)
    upstream_test.go:145: 
                Error Trace:    /Users/suraj.bankar/work/http-add-on/interceptor/handler/upstream_test.go:145
                Error:          Not equal: 
                                expected: 200
                                actual  : 500
                Test:           TestForwarderWaitsForSlowOrigin
2025/04/09 15:59:15 forwardRequest took 8.834µs
--- FAIL: TestForwarderConnectionRetryAndTimeout (0.00s)
    upstream_test.go:177: 
                Error Trace:    /Users/suraj.bankar/work/http-add-on/interceptor/handler/upstream_test.go:177
                Error:          "8.834µs" is not greater than or equal to "150ms"
                Test:           TestForwarderConnectionRetryAndTimeout
                Messages:       proxy returned after 772.125µs, expected not to return until 150ms
--- FAIL: TestForwardRequestRedirectAndHeaders (0.00s)
    upstream_test.go:222: 
                Error Trace:    /Users/suraj.bankar/work/http-add-on/interceptor/handler/upstream_test.go:222
                Error:          Not equal: 
                                expected: 301
                                actual  : 500
                Test:           TestForwardRequestRedirectAndHeaders

@yyewolf yyewolf force-pushed the feat/fallback-service branch 2 times, most recently from a951b04 to 9a4a5e2 Compare April 9, 2025 15:15
@yyewolf
Copy link
Author

yyewolf commented Apr 9, 2025

@vadasambar Should be better now, I inadvertently put true as a default in the upstream tests while in reality these scenario don't include a fallback stream.

@yyewolf yyewolf force-pushed the feat/fallback-service branch from 9a4a5e2 to 67083b7 Compare April 9, 2025 15:18
@vadasambar
Copy link

@vadasambar Should be better now, I inadvertently put true as a default in the upstream tests while in reality these scenario don't include a fallback stream.

no problem :) thank you

@wozniakjan wozniakjan requested a review from Copilot April 29, 2025 09:09
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a fallback mechanism for HTTP requests so that if the primary service is unreachable or times out, the request is forwarded to a designated fallback service. Key changes include:

  • Adding FallbackTargetRef support in the CRD and associated getter methods.
  • Propagating and handling fallback URL within contexts by introducing RequestWithFallbackStream, ContextWithFallbackStream, and FallbackStreamFromContext.
  • Adjusting proxy, routing, and upstream handlers and tests to incorporate fallback logic.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/util/contexthttp.go Added RequestWithFallbackStream function to inject a fallback stream into the request context.
pkg/util/context.go Introduced a new context key and helper functions for fallback streams.
operator/apis/http/v1alpha1/httpscaledobject_types.go Added FallbackTargetRef struct and its getter methods, along with CRD validation for fallback.
interceptor/proxy_handlers_test.go Introduced a new test ensuring requests are forwarded to the fallback service.
interceptor/proxy_handlers.go Modified forwarding handler to use fallback when the primary wait function fails.
interceptor/middleware/routing.go Updated routing logic to inject fallback stream into context if FallbackTargetRef is provided.
interceptor/handler/upstream.go Updated upstream handler construction to account for fallback streaming.
config/crd/bases/http.keda.sh_httpscaledobjects.yaml Updated CRD to include fallbackTargetRef with validation rules.
Comments suppressed due to low confidence (1)

interceptor/proxy_handlers_test.go:217

  • [nitpick] Remove the debug print statement to avoid unwanted output during test runs.
fmt.Println(res)

@yyewolf yyewolf force-pushed the feat/fallback-service branch from 67083b7 to f73338b Compare May 1, 2025 17:00
@wozniakjan wozniakjan changed the title feat: Add FallbackTargetRef and fallback to it when the request would be timing out feat: failover route on cold-start time out May 2, 2025
@yyewolf yyewolf force-pushed the feat/fallback-service branch 2 times, most recently from 27a7716 to c617762 Compare May 3, 2025 14:19
@wozniakjan wozniakjan requested a review from Copilot May 7, 2025 14:38
Copilot

This comment was marked as outdated.

wozniakjan
wozniakjan previously approved these changes May 7, 2025
@wozniakjan wozniakjan dismissed their stale review May 7, 2025 14:41

premature approval

@wozniakjan
Copy link
Member

looks like the validation checks didn't succeed. Can you please address the errors @yyewolf

@yyewolf yyewolf force-pushed the feat/fallback-service branch 2 times, most recently from 51144ff to febcdd0 Compare June 21, 2025 12:48
@yyewolf
Copy link
Author

yyewolf commented Jun 21, 2025

Sorry for the delay, I added the modification and completed the task list as well !

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a fallback service for HTTPScaledObject to route requests to a secondary service when the primary target is unreachable or times out during cold-start. Key changes include:

  • Defined ColdStartTimeoutFailoverRef in the CRD and Go types with timeout and port configuration.
  • Extended context and request utilities (ContextWithFailoverStream, RequestWithFailoverStream) for carrying the failover endpoint.
  • Updated routing and proxy handlers to select between primary and failover streams based on the cold-start timeout or error.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/util/contexthttp.go Added RequestWithFailoverStream to wrap a request with failover URL
pkg/util/context.go Introduced ckFailoverStream, ContextWithFailoverStream, and getter
operator/apis/http/v1alpha1/httpscaledobject_types.go Added Ref interface, ColdStartTimeoutFailoverRef type, and spec field
interceptor/proxy_handlers_test.go New test TestImmediatelySuccessfulFailoverProxy for failover logic
interceptor/proxy_handlers.go Forwarding handler now respects failover timeout and flag
interceptor/middleware/routing.go Routing middleware generates and sets both primary and failover streams
interceptor/handler/upstream.go Upstream now accepts a shouldFailover flag and switches stream
interceptor/handler/upstream_test.go Updated NewUpstream calls in tests to pass the new failover flag
docs/operate.md Added placeholder header for service failover documentation
config/crd/bases/http.keda.sh_httpscaledobjects.yaml CRD schema updated with coldStartTimeoutFailoverRef properties
CHANGELOG.md Documented the new failover feature in the changelog
Comments suppressed due to low confidence (3)

operator/apis/http/v1alpha1/httpscaledobject_types.go:23

  • [nitpick] The interface name Ref is quite generic; consider renaming it to something more descriptive like ServiceRef or EndpointRef to improve readability.
type Ref interface {

operator/apis/http/v1alpha1/httpscaledobject_types.go:57

  • [nitpick] The doc comment for ColdStartTimeoutFailoverRef appears to be a copy of the ScaleTargetRef comment; consider updating it to clearly describe the purpose of the failover reference.
// ColdStartTimeoutFailoverRef contains all the details about an HTTP application to scale and route to

docs/operate.md:60

  • [nitpick] This section header is empty; consider adding usage instructions or examples for configuring the failover service, or remove the placeholder header.
### Configuring Service Failover

Comment on lines +23 to +29
type Ref interface {
GetServiceName() string
GetPort() int32
GetPortName() string
}

Copy link
Member

@wozniakjan wozniakjan Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the kubernetes code generator complains about not being able to process this

https://github.com/kedacore/http-add-on/actions/runs/16195224595/job/45719990259?pr=1280

/__w/http-add-on/http-add-on/bin/controller-gen object:headerFile='hack/boilerplate.go.txt' paths='./...'
github.com/kedacore/http-add-on/operator/apis/http/v1alpha1:-: invalid type: interface{GetPort() int32; GetPortName() string; GetServiceName() string}
Error: not all generators ran successfully
run `controller-gen object:headerFile=hack/boilerplate.go.txt paths=./... -w` to see all available markers, or `controller-gen object:headerFile=hack/boilerplate.go.txt paths=./... -h` for usage

maybe we can move this to a different file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it using a tag to ignore during generation

@yyewolf yyewolf force-pushed the feat/fallback-service branch from 8c99a28 to 91cfb66 Compare July 28, 2025 18:44
@yyewolf yyewolf force-pushed the feat/fallback-service branch from 91cfb66 to 21ba5b4 Compare July 28, 2025 18:51
@yyewolf yyewolf requested a review from wozniakjan July 28, 2025 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Please hold" landing pages for slow scale from zero scenarios
4 participants