-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Dynamic forwarding load balancing policy #38757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yan Avlasov <[email protected]>
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
|
@yanavlasov If I got this PR correctly, this PR provides a way to allow the downstream/filters to force the LB to select a specific host (by header or metadata). If if there is no target host, then the fallback lb will be used? If it's your goal, I think we have provided the /wait-any |
Signed-off-by: Yan Avlasov <[email protected]>
The goal is broader than the downstream filter overriding the upstream endpoint. The first part is to allow selection of endpoints with ext_proc filter - the external endpoint picker. It needs to provide primary and retry hosts. The eventual goal is to make ext_lb - protocol for external picking of endpoints, as the initial approach is not very effective for retries - it can end up using stale retry endpoints. This is part of the ai-gateway work to support load balancing to inference workloads where load balancing policy can be more complicated than what is supported by Envoy today and where there often operator specific business logic for picking endpoints. Initially this LB support this proposal in k8s inference gateway exptensions: https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/004-endpoint-picker-protocol I did not think the |
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
| // If neither header nor metadata is present or there were errors parsing header or metadata values the | ||
| // specified fallback load balancing policy is used. This allows load balancing to degrade to a | ||
| // a built in policy (i.e. Round Robin) in case external endpoint picker fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry if i am missing something and i am no expert in the LB code, but does this mean that the fallback happens only when the header or metadata parsing failed, not when a HTTP request failed or received 5xx ? cc @yuzisun @wengyao04
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I guess the fallback here != the fallback/subsetting in the proposal ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when request failed or received unsuccessful HTTP status is determined by the route retry policy. It is not up to LB policy. If request has to be retried per policy, the router filter will call chooseHost again to get endpoint for retry attempt.
The fallback LB may be called in this case, if there are no retry endpoints specified in the metadata or headers (note that this PR does not implement this functionality - it will be added in the next PR) or there are more retry attempts than endpoints in the metatata or headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for clarifying, Yan... i am still trying to get my heads around exactly when this fallback via x-gateway-destination-endpoint-fallback happens. Since I read that both x-gateway-destination-endpoint-fallback and x-gateway-destination-endpoint are set by the extproc, what's the point of having two separate metadata? The proposal doc doesn't mention exactly when fallback happens (sorry if i miss anything). I think i should wait for the next PR as well as the reference impl...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two fallback things: fallback endpoints and fallback load balancing policy.
The fallback endpoints are set by the endpoint picking extension. For example it can select one primary endpoint and one fallback, based on their queue length. If the primary endpoint is not reachable or returned an error status the fallback endpoint will be used for retry.
The fallback load balancing policy is used when endpoint picker has failed altogether and set neither primary nor fallback endpoints. In this case Envoy will use potentially inefficient fallback LB policy, but would still be able to serve the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok now it all has started making sense and see the clear distinction between fallback lb policy vs fallback endpoints in the metadata. I think I would love to see the actual implementation of the fallback endpoints logic then, which will clear all my remaining questions.
Signed-off-by: Yan Avlasov <[email protected]>
|
@LiorLieberman PR with the LB policy |
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
|
I also did some similar work. From the implementation of the code, I didn't found any thing is the setUpstreamOverrideHost cannot do (except the retry.). In the current implementation, all complex AI related selection will be handled by the ext_proc sever (like polling metrics and comparing the load), and finally the ext proc server will return some endpoint candidates and the ext proc filter will set these endpoint candidates to metadata and lets the LB to use the metadata to do the final decision. But the story could much simpler, the ext proc filter could set these endpoint candidates by the setUpstreamOverrideHost. This only enhancement necessarily is making setUpstreamOverrideHost could accept multiple addresses to support retry. The asynchronous load balancing and external scheduler is a great point (and is helpful to make accurate retry) and I also happy to see it, If this PR is concentrate on that, I am fine. But from the code and the API, seems it doesn't? |
Adding functionality to set upstream hosts to ext_proc is a move in the wrong direction. It will not work for https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/004-endpoint-picker-protocol but also conceptually I do not think it belongs there. The Using LB policy is I think a better choice as it fits better with the abstraction model, where LB policy is responsible for selecting the endpoint with request attributes as one of the inputs. And while it does not right now have remote callout it would be a natural next step to implement it there. |
Long time ago, when @htuch and me designed this API, we hope it's a common way to let the downstream to effect the LB selection and we make it's LB_POLICY-independent by design. The setUpstreamOverrideHost() API make the whole LB selection a 2-ply structure. The first layer will select the host based on the downstream's explicit input (see I think the I removed some comments, because after a re-thinking, some comment may be one-sided. In conclusion, I think what actually is important is the 2-ply structure for host selection: one for downstream selection/decision, one for upstream load balancing. These two layer should be orthogonal or independent. That's say, we actually could add an extension point at the cluster (We have added a This extension will be evaluated first at the This way could meet your current requirements of the AI gateway without adding a complete new LB policy. And it still fine if you want to support async load balancing in the future. This way have following advantages:
|
|
Sorry I started a long discussion about PR and even after you have done something. 😞 I am pretty happy to see more AI related thing in the Envoy (because I am also doing some similar thing). But if we add this new extension/API, then it basically is forever, so, I think it deserve more disscussion. |
|
I do not see the benefit of adding an extension to an extension over just adding an extension. Why would the size of code be any different - it still needs to do the same thing? The LB extension itself is 500 lines of code it is not thousands. There are multiple ways of achieving the same goal, this is a very good way, which fits with the current abstraction model for LBs in Envoy, so there is no reason to be concerned about its longevity. The ways you are proposing do not seem to offer any particular benefits. |
|
This comment was marked as off-topic.
This comment was marked as off-topic.
|
/lgtm api |
|
@wbpcode I have addressed all comments, please take a look. |
|
got it, thanks, I will take a look today. |
wbpcode
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this great contribution. It's happy to see thing is landing gradually 🌹 ❤️ I added some comment to the API and the implementation.
/wait
source/extensions/load_balancing_policies/override_host/load_balancer.cc
Outdated
Show resolved
Hide resolved
source/extensions/load_balancing_policies/override_host/load_balancer.cc
Outdated
Show resolved
Hide resolved
source/extensions/load_balancing_policies/override_host/load_balancer.cc
Outdated
Show resolved
Hide resolved
| absl::StatusOr<std::unique_ptr<SelectedHosts>> | ||
| OverrideHostLoadBalancer::LoadBalancerImpl::getSelectedHostsFromMetadata( | ||
| const ::envoy::config::core::v3::Metadata& metadata, const Config::MetadataKey& metadata_key) { | ||
| std::unique_ptr<SelectedHosts> selected_hosts; | ||
| const ProtobufWkt::Value& metadata_value = | ||
| Config::Metadata::metadataValue(&metadata, metadata_key); | ||
| if (metadata_value.has_string_value() && !metadata_value.string_value().empty()) { | ||
| auto selected_hosts_result = SelectedHosts::make(metadata_value.string_value()); | ||
| if (!selected_hosts_result.ok()) { | ||
| ENVOY_LOG(trace, "Failed to parse SelectedEndpoints OSS {} with error {}", | ||
| metadata_value.string_value(), selected_hosts_result.status().message()); | ||
| return selected_hosts_result.status(); | ||
| } | ||
| selected_hosts = std::move(selected_hosts_result.value()); | ||
| } | ||
| return selected_hosts; | ||
| } | ||
|
|
||
| absl::StatusOr<std::unique_ptr<SelectedHosts>> | ||
| OverrideHostLoadBalancer::LoadBalancerImpl::getSelectedHostsFromHeader( | ||
| const Envoy::Http::RequestHeaderMap* header_map, const Http::LowerCaseString& header_name) { | ||
| std::unique_ptr<SelectedHosts> selected_hosts; | ||
| if (!header_map) { | ||
| return selected_hosts; | ||
| } | ||
| HeaderMap::GetResult result = header_map->get(header_name); | ||
| if (result.empty()) { | ||
| return selected_hosts; | ||
| } | ||
|
|
||
| // Use only the first value of the header, if it happens to be have multiple. | ||
| const std::string primary_host_address(result[0]->value().getStringView()); | ||
| Envoy::Network::Address::InstanceConstSharedPtr primary_host = | ||
| Envoy::Network::Utility::parseInternetAddressAndPortNoThrow(primary_host_address, false); | ||
| if (!primary_host || primary_host->type() != Envoy::Network::Address::Type::Ip) { | ||
| ENVOY_LOG(debug, "Invalid primary host in header {}: {}", header_name, primary_host_address); | ||
| return absl::InvalidArgumentError("Invalid primary host in header"); | ||
| } | ||
|
|
||
| // TODO(yanavlasov): implement parsing of fallback headers | ||
| selected_hosts = std::make_unique<SelectedHosts>( | ||
| SelectedHosts{{{primary_host->ip()->addressAsString(), primary_host->ip()->port()}}, {}}); | ||
| return selected_hosts; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think what we need here is to get first valid string (or even string_view i think) from the header or metadata source. Then we use the string to find a host from the cross host map. If the string is unvalid address, then we get nothing from the map and we will fallback to the default policy.
So, IMO, the parsing here make the code more complex and bring some performance overhead, but only bring trival benefit.
And from the definition of the SelectedHosts, you may want to get primary host and fallback host at same time in the future. But you actually could do that after the first choice is failed. We can use a
absl::StatusOr<absl::string_view> getSelectedHosts(LoadBalancerContext* context, const std::vector<OverrideSource>& soruces) to share the search logic.
By the way, I actually think we needn't the fallback_host_sources. The header/metadata value could be a list of addresses the separated by the , (for example, 1.2.3.4:90,1.2.3.5:90), then we try first address at first attempt and try second address at second attempt. This will make thing much simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reworked the code.
| if (!metadata.filter_metadata().contains(kEndpointsFallbackIndexKey)) { | ||
| // Use the primary endpoint. | ||
| ENVOY_LOG(trace, "Selecting primary endpoint {}", selected_hosts.primary.address.address); | ||
|
|
||
| // Endpoint extracted from the header does not have locality. | ||
| HostConstSharedPtr host = findHost(selected_hosts.primary); | ||
| // If the primary endpoint was found in the current host set, use it. | ||
| // Otherwise try to see if one of the failover endpoints is available. This | ||
| // is possible when the cluster received EDS update while the request to the | ||
| // endpoint picker was in flight. | ||
| if (host) { | ||
| // Save the first index into fallback hosts, so that subsequent calls to | ||
| // chooseHost method will use the fallback hosts. | ||
| updateFallbackIndexMetadata(metadata, 0); | ||
| return host; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer filter state to store the index. Then in the future, we may could store the address list in that filter state in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
api/envoy/extensions/load_balancing_policies/override_host/v3/override_host.proto
Outdated
Show resolved
Hide resolved
| // A list of host sources to get the fallback host addresses from. The first one who provides | ||
| // the addresses will be used, no matter if the addresses are valid or not. | ||
| // [#not-implemented-hide:] | ||
| repeated OverrideHostSource fallback_host_sources = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see the implementation, but I guess the fallback host source will contains a list of addresses, like 1.2.3.4:90,1.2.3.5:90. Then I think we can use single host source which contains multiple addresses for both primary and fallback? Then we needn't primary_host_sources and fallback_host_sources, we only need one host_sources.
I saw proposal in the https://github.com/kubernetes-sigs/gateway-api-inference-extension/ ... only single fallback address could be used.....
Okay, I will try to create PR at there to see if that possible to use single source for multiple address. Before that, I inclined to use only one host_sources first in our API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need two configurations for cases where an implementation that uses original_dst is transitioning to the override_host policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need two configurations for cases where an implementation that uses original_dst is transitioning to the override_host policy.
Yeah, the migration is another problem. The EPP server may need to aware whether the proxy provide the fallback support and use different header/metadata value format.
But although the EPP is one of our design target, I will still hope we can treat this new extension as general feature of Envoy and the EPP is only one usage of the new extension. That's say, we may needn't to consider the backward compatiblity of EPP self here, we should let the EPP server to resolve the compatibility problem and the new extension of Envoy only need to support the required feature like fallback, retrying, etc?
cc @yanavlasov WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And no matter how the EPP implement the fallback, we can always treat the header value or metadata value (from the specified host_sources) as list of addresses and iterate them when finding a valid host. (The single address case could be treat as list which contains only one element.)
So, I still think we can keep only one host_sources field here to parse an address list (from the the value of specified host_sources), and remove the fallback_host_sources in this initial version.
If finally we prove that the fallback_host_sources is the only choice, we can add it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good
api/envoy/extensions/load_balancing_policies/override_host/v3/override_host.proto
Outdated
Show resolved
Hide resolved
Are these other requirements captures in another issue or design doc somewhere? |
Signed-off-by: Yan Avlasov <[email protected]>
No, I do not have them captured in other place. |
wbpcode
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Some more comments are added.
| struct SelectedHosts { | ||
| struct Endpoint { | ||
| public: | ||
| struct Address { | ||
| const std::string address; | ||
| const uint32_t port; | ||
| }; | ||
|
|
||
| const Address address; | ||
| }; | ||
|
|
||
| const Endpoint primary; | ||
| const std::vector<Endpoint> failover; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unecessary to split the primary address and fallback addreses into two field. We will iterate then in order anyway. And it's also unnecessary to use the Address struct.
A simple std::vector<std::string> could be used as the SelectedHosts
source/extensions/load_balancing_policies/override_host/selected_hosts.cc
Outdated
Show resolved
Hide resolved
api/envoy/extensions/load_balancing_policies/override_host/v3/override_host.proto
Outdated
Show resolved
Hide resolved
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
wbpcode
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall with only one minor comment, thanks for the update.
api/envoy/extensions/load_balancing_policies/override_host/v3/override_host.proto
Outdated
Show resolved
Hide resolved
Signed-off-by: Yan Avlasov <[email protected]>
wbpcode
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
Commit Message: Additional Description: Risk Level: Low, new extension Testing: Unit Tests Docs Changes: Yes Release Notes: Yes Platform Specific Features: N/A --------- Signed-off-by: Yan Avlasov <[email protected]> Signed-off-by: Ting Pan <[email protected]>
Commit Message:
Additional Description:
Risk Level: Low, new extension
Testing: Unit Tests
Docs Changes: Yes
Release Notes: Yes
Platform Specific Features: N/A