Skip to content

Releases: kubernetes-sigs/gateway-api-inference-extension

v1.4.0

20 Mar 04:48
v1.4.0
6e787dd

Choose a tag to compare

Release Highlights

  • Standalone chart work landed and is included in release artifacts
  • Conformance was split into its own Go module
  • InferencePool / Helm / gRPC-related improvements landed, including appProtocol, FailOpen, and ALPN h2
  • Significant ongoing work landed in flow control, BBR, predicted latency, and datalayer internals

What's Changed

Read more

v1.4.0-rc.3

16 Mar 22:04
v1.4.0-rc.3
315d092

Choose a tag to compare

v1.4.0-rc.3 Pre-release
Pre-release

Gateway API Inference Extension v1.4.0-rc.3 is available as a prerelease for community testing.

Full Changelog: v1.4.0-rc.2...v1.4.0-rc.3

v1.4.0-rc.2

10 Mar 21:06
v1.4.0-rc.2
b32dfd0

Choose a tag to compare

v1.4.0-rc.2 Pre-release
Pre-release

RC Highlights

  • v1.4.0-rc.2 is available for community testing before the final v1.4.0 release
  • fixes the release-branch quickstart vLLM image tags so they stay aligned with main while keeping release-branch IfNotPresent pull policy
  • bumps the ./conformance nested Go module to Gateway API v1.5.0

What's Changed

  • [release-1.4] fix(release): sync quickstart vllm images by @danehans in #2522
  • [release-1.4] chore(conformance): bump gateway-api to v1.5.0 by @danehans in #2520

Full Changelog: v1.4.0-rc.1...v1.4.0-rc.2

v1.4.0-rc.1

05 Mar 22:39
v1.4.0-rc.1
8f057d7

Choose a tag to compare

v1.4.0-rc.1 Pre-release
Pre-release

RC Highlights

  • v1.4.0-rc.1 is available for community testing before the final v1.4.0 release
  • standalone chart work landed and is included in release artifacts
  • conformance was split into its own Go module
  • InferencePool / Helm / gRPC-related improvements landed, including appProtocol, FailOpen, and ALPN h2
  • significant ongoing work landed in flow control, BBR, predicted latency, and datalayer internals

What's Changed

  • cleanup: resolve technical debt and link tracking issues by @LukeAVanDrie in #2083
  • Removing dead code that throws an err when no match is found by @kfswain in #2088
  • cleanup: rename integration test utilities to remove _test suffix by @LukeAVanDrie in #2084
  • Fixed targetPorts copy error by @capri-xiyue in #2092
  • Add PR write permissions to label checker GHA, as it cannot add label… by @kfswain in #2094
  • clean unused interface by @nirrozenbaum in #2098
  • prefill aware prefix plugin by @ahg-g in #2104
  • Removing perm-restricted GHA by @kfswain in #2105
  • Updating vllm versions and fixing git commit sign by @kfswain in #2108
  • Standardize inferencepool Helm templates and drop unnecessary tpl by @tsj-30 in #1989
  • feat(bbr): add configuration flags for metrics auth and secure serving by @jpekmez in #2112
  • chore(deps): bump github.com/prometheus/prometheus from 0.308.1 to 0.309.0 by @dependabot[bot] in #2090
  • fix both error propogation and priority band fullness by @wseaton in #2103
  • Datalayer refactoring: HTTP datasource and client by @irar2 in #2120
  • Add v1 conformance report for alibabacloud ack gateway by @delavet in #2007
  • changed httproute creation to be behind a flag. by @nirrozenbaum in #2118
  • Rename part two by @shmuelk in #1968
  • rename of experimental http route creation section in helm by @nirrozenbaum in #2123
  • add scoring preference to scorer interface. by @nirrozenbaum in #2119
  • feat: make epp-standalone be its own chart by @capri-xiyue in #2122
  • fix: [Flow Control]: Optionally disable endpoint subset filtering while dispatching requests by @aishukamal in #2126
  • fix: add update helm dependency by @zetxqx in #2135
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.3 to 2.27.5 by @dependabot[bot] in #2138
  • chore(deps): bump github.com/prometheus/prometheus from 0.309.0 to 0.309.1 by @dependabot[bot] in #2136
  • chore(deps): bump github.com/onsi/gomega from 1.38.3 to 1.39.0 by @dependabot[bot] in #2137
  • Rename part three by @shmuelk in #2124
  • fixed latest guide to use httproute creation in via the helm chart by @nirrozenbaum in #2141
  • Removed duplicated field in log message by @shmuelk in #2142
  • Update the metrics used by the dashboard by @learner0810 in #2139
  • registry: switch to fine-grained leasing for flow lifecycle by @LukeAVanDrie in #2127
  • Increase default FlowGCTimeout to 1h to prevent premature GC by @LukeAVanDrie in #2143
  • update bbr quickstart guide with latest functionality by @nirrozenbaum in #2150
  • Separate conformance tests modules from main tests by @rikatz in #1994
  • feat: Add concurrency saturation detector by @LukeAVanDrie in #2062
  • feat: epp standalone helm chart included in release to docker by @capri-xiyue in #2148
  • Fix indention error for latency predictor by @liu-cong in #2158
  • Removing alpha status in GH landing page by @kfswain in #2132
  • docs: added epp standalone user guide by @capri-xiyue in #2147
  • Add tracing entry span with W3C propagation to EPP handler by @sallyom in #2057
  • feat(docs): enable content tab linking in mkdocs by @AvineshTripathi in #2176
  • update bbr label filtering to align with best practices by @nirrozenbaum in #2178
  • updated kgateway section in bbr quickstart guide by @nirrozenbaum in #2179
  • move logging util to common pkg by @nirrozenbaum in #2180
  • Interfaces towards pluggable BBR framework (initial PR) by @davidbreitgand in #2121
  • feat(api): Add appProtocol to InferencePool API for gRPC support by @zetxqx in #2162
  • docs: reference right manifest file by @sats-23 in #2186
  • test: add hermetic coverage for standalone mode by @LukeAVanDrie in #2175
  • Add support for video/audio formats for multimodal inputs by @rahulgurnani in #2181
  • fix identation bug in quickstart by @nirrozenbaum in #2182
  • refactor(flowcontrol): Migrate Fairness Policies to EPP Plugin System by @LukeAVanDrie in #2031
  • [Conformance] copy pkgs from gateway-api to enable upgrade to gateway-api v1.4.0 by @zetxqx in #2159
  • controller: extend flow lease scope to fix orphaned queues #1982 by @LukeAVanDrie in #2131
  • rename slo-aware-router to predicted-latency by @kaushikmitr in #2183
  • Better encapsulate data layer set up and validation. by @elevran in #2185
  • test: added latency predictor converage for inferencepool and added convera… by @capri-xiyue in #2187
  • cleanup: refactor multiple include into one file by @capri-xiyue in #2191
  • feat: Allow request control plugins to return ext_proc dynamic metadata by @fcfort in #2156
  • Moving the scheduling component pluggable interface and types to the common framework pkg by @ahg-g in #2192
  • Update troubleshooting guide to include remediation for incorrect pre… by @BenjaminBraunDev in #2040
  • Add flowcontrol queue length in bytes metric by @RyanRosario in #2044
  • Moved the epp/plugins pkg to be under the new framework pkg by @ahg-g in #2194
  • Move framework interfaces under epp/framework/interface by @ahg-g in #2195
  • feat: added a local mode in verify helm script by @capri-xiyue in #2196
  • [Flow Control] ...
Read more

v1.3.1

20 Feb 00:40
v1.3.1

Choose a tag to compare

Fixes

This patch cherry picks a few fixes for:
#2321
#2300
#2316

v1.3.0

Noteworthy

LoRA Syncer

This release, and future releases will not have the lora syncer image associated with them, as we are deprecating that feature, a similar functionality will still exist in the form of the file system resolver. For model servers that do not yet support this form of LoRA management, but support the discrete LoRA management endpoints that the lora-syncer uses, the old images will be kept indefinitely, and can still be used.

In the next release, the lora syncer code will be removed from the codebase.

Flow Control

Flow Control continues to evolve with the addition of Scale from/to Zero support. Allowing requests to be sent to an EPP with no model serving endpoints behind it, and emitting metrics to be used by the autoscaler to then scale up the pool.

In following releases we will continue to develop towards this feature being default enabled.

Standalone EPP

This functionality allows the EPP to be deployed as a proxy, all contained within a single pod. This is achieved by the Envoy proxy having EPP as a sidecar container. This feature was developed for batch inference scenarios, and is currently considered experimental.

v1.3.1-rc.1

18 Feb 01:41
v1.3.1-rc.1

Choose a tag to compare

v1.3.1-rc.1 Pre-release
Pre-release

This patch cherry picks a few fixes for:
#2321
#2300
#2316

Full Changelog: v1.3.0...v1.3.1-rc.1

v1.3.0

21 Jan 14:17
v1.3.0
616745e

Choose a tag to compare

Noteworthy

LoRA Syncer

This release, and future releases will not have the lora syncer image associated with them, as we are deprecating that feature, a similar functionality will still exist in the form of the file system resolver. For model servers that do not yet support this form of LoRA management, but support the discrete LoRA management endpoints that the lora-syncer uses, the old images will be kept indefinitely, and can still be used.

In the next release, the lora syncer code will be removed from the codebase.

Flow Control

Flow Control continues to evolve with the addition of Scale from/to Zero support. Allowing requests to be sent to an EPP with no model serving endpoints behind it, and emitting metrics to be used by the autoscaler to then scale up the pool.

In following releases we will continue to develop towards this feature being default enabled.

Standalone EPP

This functionality allows the EPP to be deployed as a proxy, all contained within a single pod. This is achieved by the Envoy proxy having EPP as a sidecar container. This feature was developed for batch inference scenarios, and is currently considered experimental.

Fix(es)

  • We improved the functionality of the approximate prefix cache scorer when working with the llm-d P/D setup

What's Changed

  • Added crd validation ci workflow. by @bexxmodd in #1879
  • chore: bump sim version by @nirrozenbaum in #1890
  • feat(conformance): add conformance test for verifying x-gateway-destination-endpoint-served by @zetxqx in #1862
  • Add deprecation notice on metrics port in runner and datastore by @elevran in #1886
  • refactor: Flatten Flow Control inter-flow policy plugin directory structure by @LukeAVanDrie in #1841
  • Execute prepare data plugins in topological order of data dependencies by @rahulgurnani in #1878
  • chore(deps): bump go.uber.org/zap from 1.27.0 to 1.27.1 by @dependabot[bot] in #1896
  • chore(deps): bump google.golang.org/grpc from 1.76.0 to 1.77.0 by @dependabot[bot] in #1897
  • chore(deps): bump github.com/prometheus/common from 0.67.2 to 0.67.4 by @dependabot[bot] in #1895
  • enhance bbr helm chart to generalize cmd-line args by @nirrozenbaum in #1900
  • feat: Add totalRunningRequests metric for latency predictor by @BenjaminBraunDev in #1899
  • chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.0 to 6.3.1 by @dependabot[bot] in #1898
  • SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment by @BenjaminBraunDev in #1839
  • Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc by @ezrasilvera in #1905
  • fix: fixed helm chart by @capri-xiyue in #1907
  • docs: add Kgateway BBR documentation by @howardjohn in #1908
  • Implement EPP Plugins by datalayer objects by @elevran in #1901
  • feat: Implement Model Rewrite and Traffic Splitting Logic by @zetxqx in #1820
  • docs: Updated quickstart to use stable Istio release 1.28.0 by @atharva-310 in #1902
  • fix(release): correctly update lora-syncer and epp image tags across RC and final releases by @googs1025 in #1916
  • fix: sort InferenceModelRewrite lists by (Namespace, Name) in tests by @googs1025 in #1917
  • Define and register plugin factories for datalayer by @elevran in #1911
  • fix: Properly install the InferenceModelRewrite CRD using kustomize by @shmuelk in #1934
  • Move AllPodsPredicate to datastore package by @elevran in #1939
  • Add automatic TLS certificate reloading for EPP by @pierDipi in #1765
  • feat(modelRewrite): Add metrics for InferenceModelRewrite decisions by @zetxqx in #1938
  • fix: CI golangci-lint errors by @shmuelk in #1948
  • Update inference perf chart to match upstream chart + Add Prefix Cache Github Actions by @rlakhtakia in #1949
  • Standardize plugins.TypedName field name from 'tn' to 'typedName' by @rohithnarasimha in #1918
  • Update inference perf chart to use new hf token structure. by @rlakhtakia in #1955
  • fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable by @BenjaminBraunDev in #1929
  • fix config load error when picker is set before the scoerer w/o weight. by @zetxqx in #1958
  • add kaushikmitr as appoved of slo aware routing plugin by @kaushikmitr in #1956
  • refactor: [Scale from Zero] Introduce PodLocator by @LukeAVanDrie in #1950
  • feat: add config validation in predicted-latency-scorer plugin by @googs1025 in #1904
  • Run tests with two data layer implementations by @irar2 in #1930
  • Rename PodInfo struct to EndpointMetadata to better reflect its purpose by @shmuelk in #1866
  • feat(metrics): add scheduler attempt counter by @googs1025 in #1931
  • chore: update released quickstart to v1.2.1 by @nirrozenbaum in #1941
  • generalize latest release quickstart by @nirrozenbaum in #1966
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #1971
  • chore(deps): bump golang.org/x/sync from 0.18.0 to 0.19.0 by @dependabot[bot] in #1972
  • chore(deps): bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.39.0 by @dependabot[bot] in #1975
  • refactor: Standardize config loading and system default injection by @LukeAVanDrie in #1953
  • chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #1974
  • chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.38.0 to 1.39.0 by @dependabot[bot] in #1973
  • feat: Enable Scale-from-Zero with Flow Control enabled by @LukeAVanDrie in #1952
  • feature: (helm) support custom volumes and volumeMounts for epp by @delavet in #1945
  • Use spf13/pflag instead of Go's standard flag package by @elevran in #1979
  • Extend textual configuration support with the Datalayer's configuration by @shmuelk in #1914
  • test/integration: introduce robust harness and migrate BBR suite by @LukeAVanDrie in #1959
  • test/bbr: fix startup race condition and IPv6 address formatting by @LukeAVanDrie in #1987
  • [chore]Bump vLLM Image Tags by @Frapschen in #1733
  • Add Prefill Heavy E2E Test to Github Actions by @rlakhtakia in #1894
    ...
Read more

v1.3.0-rc.3

15 Jan 14:22
v1.3.0-rc.3

Choose a tag to compare

v1.3.0-rc.3 Pre-release
Pre-release

RC diff

  • Helm fixes
  • Scale from zero fixes

What's Changed

  • Added crd validation ci workflow. by @bexxmodd in #1879
  • chore: bump sim version by @nirrozenbaum in #1890
  • feat(conformance): add conformance test for verifying x-gateway-destination-endpoint-served by @zetxqx in #1862
  • Add deprecation notice on metrics port in runner and datastore by @elevran in #1886
  • refactor: Flatten Flow Control inter-flow policy plugin directory structure by @LukeAVanDrie in #1841
  • Execute prepare data plugins in topological order of data dependencies by @rahulgurnani in #1878
  • chore(deps): bump go.uber.org/zap from 1.27.0 to 1.27.1 by @dependabot[bot] in #1896
  • chore(deps): bump google.golang.org/grpc from 1.76.0 to 1.77.0 by @dependabot[bot] in #1897
  • chore(deps): bump github.com/prometheus/common from 0.67.2 to 0.67.4 by @dependabot[bot] in #1895
  • enhance bbr helm chart to generalize cmd-line args by @nirrozenbaum in #1900
  • feat: Add totalRunningRequests metric for latency predictor by @BenjaminBraunDev in #1899
  • chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.0 to 6.3.1 by @dependabot[bot] in #1898
  • SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment by @BenjaminBraunDev in #1839
  • Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc by @ezrasilvera in #1905
  • fix: fixed helm chart by @capri-xiyue in #1907
  • docs: add Kgateway BBR documentation by @howardjohn in #1908
  • Implement EPP Plugins by datalayer objects by @elevran in #1901
  • feat: Implement Model Rewrite and Traffic Splitting Logic by @zetxqx in #1820
  • docs: Updated quickstart to use stable Istio release 1.28.0 by @atharva-310 in #1902
  • fix(release): correctly update lora-syncer and epp image tags across RC and final releases by @googs1025 in #1916
  • fix: sort InferenceModelRewrite lists by (Namespace, Name) in tests by @googs1025 in #1917
  • Define and register plugin factories for datalayer by @elevran in #1911
  • fix: Properly install the InferenceModelRewrite CRD using kustomize by @shmuelk in #1934
  • Move AllPodsPredicate to datastore package by @elevran in #1939
  • Add automatic TLS certificate reloading for EPP by @pierDipi in #1765
  • feat(modelRewrite): Add metrics for InferenceModelRewrite decisions by @zetxqx in #1938
  • fix: CI golangci-lint errors by @shmuelk in #1948
  • Update inference perf chart to match upstream chart + Add Prefix Cache Github Actions by @rlakhtakia in #1949
  • Standardize plugins.TypedName field name from 'tn' to 'typedName' by @rohithnarasimha in #1918
  • Update inference perf chart to use new hf token structure. by @rlakhtakia in #1955
  • fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable by @BenjaminBraunDev in #1929
  • fix config load error when picker is set before the scoerer w/o weight. by @zetxqx in #1958
  • add kaushikmitr as appoved of slo aware routing plugin by @kaushikmitr in #1956
  • refactor: [Scale from Zero] Introduce PodLocator by @LukeAVanDrie in #1950
  • feat: add config validation in predicted-latency-scorer plugin by @googs1025 in #1904
  • Run tests with two data layer implementations by @irar2 in #1930
  • Rename PodInfo struct to EndpointMetadata to better reflect its purpose by @shmuelk in #1866
  • feat(metrics): add scheduler attempt counter by @googs1025 in #1931
  • chore: update released quickstart to v1.2.1 by @nirrozenbaum in #1941
  • generalize latest release quickstart by @nirrozenbaum in #1966
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #1971
  • chore(deps): bump golang.org/x/sync from 0.18.0 to 0.19.0 by @dependabot[bot] in #1972
  • chore(deps): bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.39.0 by @dependabot[bot] in #1975
  • refactor: Standardize config loading and system default injection by @LukeAVanDrie in #1953
  • chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #1974
  • chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.38.0 to 1.39.0 by @dependabot[bot] in #1973
  • feat: Enable Scale-from-Zero with Flow Control enabled by @LukeAVanDrie in #1952
  • feature: (helm) support custom volumes and volumeMounts for epp by @delavet in #1945
  • Use spf13/pflag instead of Go's standard flag package by @elevran in #1979
  • Extend textual configuration support with the Datalayer's configuration by @shmuelk in #1914
  • test/integration: introduce robust harness and migrate BBR suite by @LukeAVanDrie in #1959
  • test/bbr: fix startup race condition and IPv6 address formatting by @LukeAVanDrie in #1987
  • [chore]Bump vLLM Image Tags by @Frapschen in #1733
  • Add Prefill Heavy E2E Test to Github Actions by @rlakhtakia in #1894
  • Add decode heavy benchmark e2e test to github actions. by @rlakhtakia in #1893
  • BBR multi lora guide by @davidbreitgand in #1940
  • [feat] Add running requests scorer and tests by @BenjaminBraunDev in #1957
  • Implement PrepareDataPlugin for prefix cache match plugin by @rahulgurnani in #1942
  • Define and implement command line parsing with Options struct by @elevran in #1984
  • fix(inferenceModelRewrites): conditionally skip watching InferenceModelRewrite and InferenceObjective by @zetxqx in #1967
  • Add e2e test for multiport InferencePool enhancement by @RyanRosario in #1885
  • chore(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc from 1.38.0 to 1.39.0 by @dependabot[bot] in #1997
  • flowcontrol: refactor registry config to support dynamic priority...
Read more

v1.3.0-rc.2

09 Jan 16:25
v1.3.0-rc.2

Choose a tag to compare

v1.3.0-rc.2 Pre-release
Pre-release

Fixes in this RC

  • Issue with standalone EPP fixed
  • Issue with approx prefix not working in the P/D scenario

Noteworthy

LoRA Syncer

This release, and future releases will not have the lora syncer image associated with them, as we are deprecating that feature, a similar functionality will still exist in the form of the file system resolver. For model servers that do not yet support this form of LoRA management, but support the discrete LoRA management endpoints that the lora-syncer uses, the old images will be kept indefinitely, and can still be used.

In the next release, the lora syncer code will be removed from the codebase.

What's Changed

  • Added crd validation ci workflow. by @bexxmodd in #1879
  • chore: bump sim version by @nirrozenbaum in #1890
  • feat(conformance): add conformance test for verifying x-gateway-destination-endpoint-served by @zetxqx in #1862
  • Add deprecation notice on metrics port in runner and datastore by @elevran in #1886
  • refactor: Flatten Flow Control inter-flow policy plugin directory structure by @LukeAVanDrie in #1841
  • Execute prepare data plugins in topological order of data dependencies by @rahulgurnani in #1878
  • chore(deps): bump go.uber.org/zap from 1.27.0 to 1.27.1 by @dependabot[bot] in #1896
  • chore(deps): bump google.golang.org/grpc from 1.76.0 to 1.77.0 by @dependabot[bot] in #1897
  • chore(deps): bump github.com/prometheus/common from 0.67.2 to 0.67.4 by @dependabot[bot] in #1895
  • enhance bbr helm chart to generalize cmd-line args by @nirrozenbaum in #1900
  • feat: Add totalRunningRequests metric for latency predictor by @BenjaminBraunDev in #1899
  • chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.0 to 6.3.1 by @dependabot[bot] in #1898
  • SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment by @BenjaminBraunDev in #1839
  • Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc by @ezrasilvera in #1905
  • fix: fixed helm chart by @capri-xiyue in #1907
  • docs: add Kgateway BBR documentation by @howardjohn in #1908
  • Implement EPP Plugins by datalayer objects by @elevran in #1901
  • feat: Implement Model Rewrite and Traffic Splitting Logic by @zetxqx in #1820
  • docs: Updated quickstart to use stable Istio release 1.28.0 by @atharva-310 in #1902
  • fix(release): correctly update lora-syncer and epp image tags across RC and final releases by @googs1025 in #1916
  • fix: sort InferenceModelRewrite lists by (Namespace, Name) in tests by @googs1025 in #1917
  • Define and register plugin factories for datalayer by @elevran in #1911
  • fix: Properly install the InferenceModelRewrite CRD using kustomize by @shmuelk in #1934
  • Move AllPodsPredicate to datastore package by @elevran in #1939
  • Add automatic TLS certificate reloading for EPP by @pierDipi in #1765
  • feat(modelRewrite): Add metrics for InferenceModelRewrite decisions by @zetxqx in #1938
  • fix: CI golangci-lint errors by @shmuelk in #1948
  • Update inference perf chart to match upstream chart + Add Prefix Cache Github Actions by @rlakhtakia in #1949
  • Standardize plugins.TypedName field name from 'tn' to 'typedName' by @rohithnarasimha in #1918
  • Update inference perf chart to use new hf token structure. by @rlakhtakia in #1955
  • fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable by @BenjaminBraunDev in #1929
  • fix config load error when picker is set before the scoerer w/o weight. by @zetxqx in #1958
  • add kaushikmitr as appoved of slo aware routing plugin by @kaushikmitr in #1956
  • refactor: [Scale from Zero] Introduce PodLocator by @LukeAVanDrie in #1950
  • feat: add config validation in predicted-latency-scorer plugin by @googs1025 in #1904
  • Run tests with two data layer implementations by @irar2 in #1930
  • Rename PodInfo struct to EndpointMetadata to better reflect its purpose by @shmuelk in #1866
  • feat(metrics): add scheduler attempt counter by @googs1025 in #1931
  • chore: update released quickstart to v1.2.1 by @nirrozenbaum in #1941
  • generalize latest release quickstart by @nirrozenbaum in #1966
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #1971
  • chore(deps): bump golang.org/x/sync from 0.18.0 to 0.19.0 by @dependabot[bot] in #1972
  • chore(deps): bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.39.0 by @dependabot[bot] in #1975
  • refactor: Standardize config loading and system default injection by @LukeAVanDrie in #1953
  • chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #1974
  • chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.38.0 to 1.39.0 by @dependabot[bot] in #1973
  • feat: Enable Scale-from-Zero with Flow Control enabled by @LukeAVanDrie in #1952
  • feature: (helm) support custom volumes and volumeMounts for epp by @delavet in #1945
  • Use spf13/pflag instead of Go's standard flag package by @elevran in #1979
  • Extend textual configuration support with the Datalayer's configuration by @shmuelk in #1914
  • test/integration: introduce robust harness and migrate BBR suite by @LukeAVanDrie in #1959
  • test/bbr: fix startup race condition and IPv6 address formatting by @LukeAVanDrie in #1987
  • [chore]Bump vLLM Image Tags by @Frapschen in #1733
  • Add Prefill Heavy E2E Test to Github Actions by @rlakhtakia in #1894
  • Add decode heavy benchmark e2e test to github actions. by @rlakhtakia in #1893
  • BBR multi lora guide by @davidbreitgand in #1940
  • [feat] Add running requests scorer and tests by @BenjaminBraunDev in #1957
  • Implement PrepareDataPlugin for prefix cache match plugin by @rahulgurnani in #1942
  • Define and implement command line parsing with Options struct by @elevran in ...
Read more

v1.3.0-rc.1

07 Jan 15:59
v1.3.0-rc.1

Choose a tag to compare

v1.3.0-rc.1 Pre-release
Pre-release

Noteworthy

LoRA Syncer

This release, and future releases will not have the lora syncer image associated with them, as we are deprecating that feature, a similar functionality will still exist in the form of the file system resolver. For model servers that do not yet support this form of LoRA management, but support the discrete LoRA management endpoints that the lora-syncer uses, the old images will be kept indefinitely, and can still be used.

In the next release, the lora syncer code will be removed from the codebase.

What's Changed

  • Added crd validation ci workflow. by @bexxmodd in #1879
  • chore: bump sim version by @nirrozenbaum in #1890
  • feat(conformance): add conformance test for verifying x-gateway-destination-endpoint-served by @zetxqx in #1862
  • Add deprecation notice on metrics port in runner and datastore by @elevran in #1886
  • refactor: Flatten Flow Control inter-flow policy plugin directory structure by @LukeAVanDrie in #1841
  • Execute prepare data plugins in topological order of data dependencies by @rahulgurnani in #1878
  • chore(deps): bump go.uber.org/zap from 1.27.0 to 1.27.1 by @dependabot[bot] in #1896
  • chore(deps): bump google.golang.org/grpc from 1.76.0 to 1.77.0 by @dependabot[bot] in #1897
  • chore(deps): bump github.com/prometheus/common from 0.67.2 to 0.67.4 by @dependabot[bot] in #1895
  • enhance bbr helm chart to generalize cmd-line args by @nirrozenbaum in #1900
  • feat: Add totalRunningRequests metric for latency predictor by @BenjaminBraunDev in #1899
  • chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.0 to 6.3.1 by @dependabot[bot] in #1898
  • SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment by @BenjaminBraunDev in #1839
  • Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc by @ezrasilvera in #1905
  • fix: fixed helm chart by @capri-xiyue in #1907
  • docs: add Kgateway BBR documentation by @howardjohn in #1908
  • Implement EPP Plugins by datalayer objects by @elevran in #1901
  • feat: Implement Model Rewrite and Traffic Splitting Logic by @zetxqx in #1820
  • docs: Updated quickstart to use stable Istio release 1.28.0 by @atharva-310 in #1902
  • fix(release): correctly update lora-syncer and epp image tags across RC and final releases by @googs1025 in #1916
  • fix: sort InferenceModelRewrite lists by (Namespace, Name) in tests by @googs1025 in #1917
  • Define and register plugin factories for datalayer by @elevran in #1911
  • fix: Properly install the InferenceModelRewrite CRD using kustomize by @shmuelk in #1934
  • Move AllPodsPredicate to datastore package by @elevran in #1939
  • Add automatic TLS certificate reloading for EPP by @pierDipi in #1765
  • feat(modelRewrite): Add metrics for InferenceModelRewrite decisions by @zetxqx in #1938
  • fix: CI golangci-lint errors by @shmuelk in #1948
  • Update inference perf chart to match upstream chart + Add Prefix Cache Github Actions by @rlakhtakia in #1949
  • Standardize plugins.TypedName field name from 'tn' to 'typedName' by @rohithnarasimha in #1918
  • Update inference perf chart to use new hf token structure. by @rlakhtakia in #1955
  • fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable by @BenjaminBraunDev in #1929
  • fix config load error when picker is set before the scoerer w/o weight. by @zetxqx in #1958
  • add kaushikmitr as appoved of slo aware routing plugin by @kaushikmitr in #1956
  • refactor: [Scale from Zero] Introduce PodLocator by @LukeAVanDrie in #1950
  • feat: add config validation in predicted-latency-scorer plugin by @googs1025 in #1904
  • Run tests with two data layer implementations by @irar2 in #1930
  • Rename PodInfo struct to EndpointMetadata to better reflect its purpose by @shmuelk in #1866
  • feat(metrics): add scheduler attempt counter by @googs1025 in #1931
  • chore: update released quickstart to v1.2.1 by @nirrozenbaum in #1941
  • generalize latest release quickstart by @nirrozenbaum in #1966
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #1971
  • chore(deps): bump golang.org/x/sync from 0.18.0 to 0.19.0 by @dependabot[bot] in #1972
  • chore(deps): bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.39.0 by @dependabot[bot] in #1975
  • refactor: Standardize config loading and system default injection by @LukeAVanDrie in #1953
  • chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #1974
  • chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.38.0 to 1.39.0 by @dependabot[bot] in #1973
  • feat: Enable Scale-from-Zero with Flow Control enabled by @LukeAVanDrie in #1952
  • feature: (helm) support custom volumes and volumeMounts for epp by @delavet in #1945
  • Use spf13/pflag instead of Go's standard flag package by @elevran in #1979
  • Extend textual configuration support with the Datalayer's configuration by @shmuelk in #1914
  • test/integration: introduce robust harness and migrate BBR suite by @LukeAVanDrie in #1959
  • test/bbr: fix startup race condition and IPv6 address formatting by @LukeAVanDrie in #1987
  • [chore]Bump vLLM Image Tags by @Frapschen in #1733
  • Add Prefill Heavy E2E Test to Github Actions by @rlakhtakia in #1894
  • Add decode heavy benchmark e2e test to github actions. by @rlakhtakia in #1893
  • BBR multi lora guide by @davidbreitgand in #1940
  • [feat] Add running requests scorer and tests by @BenjaminBraunDev in #1957
  • Implement PrepareDataPlugin for prefix cache match plugin by @rahulgurnani in #1942
  • Define and implement command line parsing with Options struct by @elevran in #1984
  • fix(inferenceModelRewrites): condition...
Read more