v1.4.0-rc.1
Pre-release
Pre-release
RC Highlights
v1.4.0-rc.1is available for community testing before the final v1.4.0 release- standalone chart work landed and is included in release artifacts
- conformance was split into its own Go module
- InferencePool / Helm / gRPC-related improvements landed, including
appProtocol,FailOpen, and ALPNh2 - significant ongoing work landed in flow control, BBR, predicted latency, and datalayer internals
What's Changed
- cleanup: resolve technical debt and link tracking issues by @LukeAVanDrie in #2083
- Removing dead code that throws an err when no match is found by @kfswain in #2088
- cleanup: rename integration test utilities to remove _test suffix by @LukeAVanDrie in #2084
- Fixed targetPorts copy error by @capri-xiyue in #2092
- Add PR write permissions to label checker GHA, as it cannot add label… by @kfswain in #2094
- clean unused interface by @nirrozenbaum in #2098
- prefill aware prefix plugin by @ahg-g in #2104
- Removing perm-restricted GHA by @kfswain in #2105
- Updating vllm versions and fixing git commit sign by @kfswain in #2108
- Standardize inferencepool Helm templates and drop unnecessary tpl by @tsj-30 in #1989
- feat(bbr): add configuration flags for metrics auth and secure serving by @jpekmez in #2112
- chore(deps): bump github.com/prometheus/prometheus from 0.308.1 to 0.309.0 by @dependabot[bot] in #2090
- fix both error propogation and priority band fullness by @wseaton in #2103
- Datalayer refactoring: HTTP datasource and client by @irar2 in #2120
- Add v1 conformance report for alibabacloud ack gateway by @delavet in #2007
- changed httproute creation to be behind a flag. by @nirrozenbaum in #2118
- Rename part two by @shmuelk in #1968
- rename of experimental http route creation section in helm by @nirrozenbaum in #2123
- add scoring preference to scorer interface. by @nirrozenbaum in #2119
- feat: make epp-standalone be its own chart by @capri-xiyue in #2122
- fix: [Flow Control]: Optionally disable endpoint subset filtering while dispatching requests by @aishukamal in #2126
- fix: add update helm dependency by @zetxqx in #2135
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.3 to 2.27.5 by @dependabot[bot] in #2138
- chore(deps): bump github.com/prometheus/prometheus from 0.309.0 to 0.309.1 by @dependabot[bot] in #2136
- chore(deps): bump github.com/onsi/gomega from 1.38.3 to 1.39.0 by @dependabot[bot] in #2137
- Rename part three by @shmuelk in #2124
- fixed latest guide to use httproute creation in via the helm chart by @nirrozenbaum in #2141
- Removed duplicated field in log message by @shmuelk in #2142
- Update the metrics used by the dashboard by @learner0810 in #2139
- registry: switch to fine-grained leasing for flow lifecycle by @LukeAVanDrie in #2127
- Increase default FlowGCTimeout to 1h to prevent premature GC by @LukeAVanDrie in #2143
- update bbr quickstart guide with latest functionality by @nirrozenbaum in #2150
- Separate conformance tests modules from main tests by @rikatz in #1994
- feat: Add concurrency saturation detector by @LukeAVanDrie in #2062
- feat: epp standalone helm chart included in release to docker by @capri-xiyue in #2148
- Fix indention error for latency predictor by @liu-cong in #2158
- Removing alpha status in GH landing page by @kfswain in #2132
- docs: added epp standalone user guide by @capri-xiyue in #2147
- Add tracing entry span with W3C propagation to EPP handler by @sallyom in #2057
- feat(docs): enable content tab linking in mkdocs by @AvineshTripathi in #2176
- update bbr label filtering to align with best practices by @nirrozenbaum in #2178
- updated kgateway section in bbr quickstart guide by @nirrozenbaum in #2179
- move logging util to common pkg by @nirrozenbaum in #2180
- Interfaces towards pluggable BBR framework (initial PR) by @davidbreitgand in #2121
- feat(api): Add appProtocol to InferencePool API for gRPC support by @zetxqx in #2162
- docs: reference right manifest file by @sats-23 in #2186
- test: add hermetic coverage for standalone mode by @LukeAVanDrie in #2175
- Add support for video/audio formats for multimodal inputs by @rahulgurnani in #2181
- fix identation bug in quickstart by @nirrozenbaum in #2182
- refactor(flowcontrol): Migrate Fairness Policies to EPP Plugin System by @LukeAVanDrie in #2031
- [Conformance] copy pkgs from gateway-api to enable upgrade to gateway-api v1.4.0 by @zetxqx in #2159
- controller: extend flow lease scope to fix orphaned queues #1982 by @LukeAVanDrie in #2131
- rename slo-aware-router to predicted-latency by @kaushikmitr in #2183
- Better encapsulate data layer set up and validation. by @elevran in #2185
- test: added latency predictor converage for inferencepool and added convera… by @capri-xiyue in #2187
- cleanup: refactor multiple include into one file by @capri-xiyue in #2191
- feat: Allow request control plugins to return ext_proc dynamic metadata by @fcfort in #2156
- Moving the scheduling component pluggable interface and types to the common framework pkg by @ahg-g in #2192
- Update troubleshooting guide to include remediation for incorrect pre… by @BenjaminBraunDev in #2040
- Add flowcontrol queue length in bytes metric by @RyanRosario in #2044
- Moved the epp/plugins pkg to be under the new framework pkg by @ahg-g in #2194
- Move framework interfaces under epp/framework/interface by @ahg-g in #2195
- feat: added a local mode in verify helm script by @capri-xiyue in #2196
- [Flow Control] Garbage Collection for Priority Bands by @evacchi in #2097
- Moving requestcontrol pluggable interfaces/types to epp/framework/interface by @ahg-g in #2197
- Flow Control: Ordering Policy Migration (Phase 1) by @LukeAVanDrie in #2188
- docs: reorganize EPP configuration guide structure by @cr7258 in #1962
- fix: fixed epp standalone helm and updated verify helm script by @capri-xiyue in #2204
- Fix kgateway document link by @learner0810 in #2140
- change scorer weight type from int to float by @nirrozenbaum in #2207
- Flow Control: Ordering Policy Migration (Phase 2) by @LukeAVanDrie in #2193
- cleanup of setupLog from function calls in runner by @nirrozenbaum in #2206
- fix(registry): fix JIT error scoping race by @LukeAVanDrie in #2199
- refactor(flowcontrol): migrate interfaces to framework/interface/flowcontrol by @LukeAVanDrie in #2208
- Move requestcontrol interface dependencies to epp/framework/interface by @ahg-g in #2213
- Update prefix scorer to report cached prefix length in tokens by @mayabar in #2053
- Add SGlang example/docs by @rahulgurnani in #2002
- refactor: rename interflow package to fairness by @LukeAVanDrie in #2214
- Extracted all scheduling execution logic out of framework/interface/scheduling pkg by @ahg-g in #2223
- refactor: rename intraflow package to ordering by @LukeAVanDrie in #2215
- feat: switch saturation detection to gradient signal by @LukeAVanDrie in #2224
- fix: removed duplicate include by @capri-xiyue in #2225
- test file for datastore.go by @tsj-30 in #2200
- cleanup(flowcontrol): Refactor registry with generic leasing and atomic GC by @LukeAVanDrie in #2198
- chore: bump sim version by @nirrozenbaum in #2227
- Move datalayer plugin intefaces and associated types to framework/interface/datalayer by @ahg-g in #2228
- feat: add FC configuration to EndpointPickerConfig by @LukeAVanDrie in #2217
- refactor: move flow control types to framework pkg by @LukeAVanDrie in #2221
- Moving the scheduling plugins under epp/framework/plugins by @ahg-g in #2230
- Fix/ feature for issue 2028. Add support for responses api and conversations api by @srampal in #2133
- Moved request control plugins to epp/framework/plugins by @ahg-g in #2231
- feat: wire flow control policies and smart queue defaults by @LukeAVanDrie in #2232
- Add NGF 2.4 conformance report by @bjee19 in #2236
- documentation: add guide for trace support by @JeffLuoo in #2212
- feat: New cost reporting plugin for returning cost in ext_proc dynamic metadata response to proxy by @fcfort in #2114
- feat(metrics): support engine-aware metric collection by @bongwoobak in #2161
- Add
pod_typecategorical feature to latency prediction models by @RishabhSaini in #1993 - test: fix flakes in controller and registry tests by @LukeAVanDrie in #2250
- chore(deps): bump go.opentelemetry.io/otel/trace from 1.39.0 to 1.40.0 by @dependabot[bot] in #2254
- Add v1.0.1 conformance report for Istio by @ericdbishop in #2235
- feat: added epp standalone label to track usage by @capri-xiyue in #2252
- docs: fix epp README.md link for metrics by @rumstead in #2259
- docs: Removed speculative statement about creating a new Slack channel by @terrytangyuan in #2272
- Implement garbage collection for SLO context store by @aniketmohanty82 in #2245
- Change compilation-config flag to valid default parameter by @BenjaminBraunDev in #2244
- feat: added guide for epp standalone with inferencepool dependency by @capri-xiyue in #2260
- support for setting up epp deployment resources by @learner0810 in #2273
- SGlang: Fixes to the getting started guide by @rahulgurnani in #2279
- add priority to admit request by @kaushikmitr in #2251
- Initial move of datalayer plugins to epp/framework/plugins by @ahg-g in #2286
- Moved metrics keys to the datalayer framework pkg by @ahg-g in #2288
- align attritbute reporter with plugins structure by @nirrozenbaum in #2290
- move logging under observability dir by @nirrozenbaum in #2292
- extracted InitLogging to common observability logging pkg by @nirrozenbaum in #2296
- refactor: rename epp standalone to standalone by @capri-xiyue in #2298
- Validate dependencies for all producer/consumer plugins in datalayer by @rahulgurnani in #2246
- refactor: rename to epplib by @capri-xiyue in #2299
- chore(deps): bump sigs.k8s.io/kustomize/kyaml from 0.21.0 to 0.21.1 by @dependabot[bot] in #2303
- Check scheme in the HTTP datasource creation by @irar2 in #2305
- chore(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc from 1.39.0 to 1.40.0 by @dependabot[bot] in #2304
- Move the metrics extractor to its own pkg under plugins/datalayer/extractor by @ahg-g in #2291
- extracted tracing/profiling/metrics common to observability by @nirrozenbaum in #2307
- SGLang docs: Revert changes to the released guide and apply them to the latest guide by @rahulgurnani in #2302
- fix(datastore): remove orphaned rank endpoints when targetPorts changes by @bongwoobak in #2308
- chore: remove deprecated top-level istio values by @varad-ahirwadkar in #2280
- add PopulateControllerConfig in runner.go by @kaushikmitr in #2300
- clean up: remove model server type by @capri-xiyue in #2311
- Revert "clean up: remove model server type" by @capri-xiyue in #2312
- conformance: better polling times by @howardjohn in #2313
- chore: fix typo by @hhk7734 in #2322
- docs: fixed standalone docs by @capri-xiyue in #2314
- Config options fix by @shmuelk in #2321
- chore: Separate Gateway API main module and conformance module dependencies by @ericdbishop in #2285
- conformance: modify warmup from weighted two-pools test by @danehans in #2316
- Add dispatch cycle duration metric for flow control by @RyanRosario in #2110
- Add unit tests to validate Datalayer metrics source and extractor produce the same metrics as backend.PodMetrics by @elevran in #2323
- Refactor: Centralize EndpointPool initialization for Standalone mode by @varad-ahirwadkar in #2229
- Add flow control request enqueue metric by @RyanRosario in #2153
- move latency prediction to prepare data step and add admit request plugin hook by @kaushikmitr in #2319
- [Feature] Add k8s notification data sources by @elevran in #2320
- Script to validate framework imports are confined by @elevran in #2337
- Have a default configuration for the EndpointPicker by @shmuelk in #2341
- A transitional PR towards pluggable BBR Framework by @davidbreitgand in #2209
- chore(deps): bump github.com/onsi/gomega from 1.39.0 to 1.39.1 by @dependabot[bot] in #2346
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #2344
- chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.1 to 6.3.2 by @dependabot[bot] in #2345
- chore(deps): bump sigs.k8s.io/controller-runtime from 0.22.4 to 0.23.1 by @dependabot[bot] in #2216
- Enable make
verify-fw-importsby @elevran in #2358 - more work on pluggable bbr and alignment with epp structure and naming by @nirrozenbaum in #2348
- Move datasource/extractor validations to datalayer runtime. by @elevran in #2357
- Make PredictedLatencyScorer PD Aware by @RishabhSaini in #2361
- fix: InitLogging verbosity ignored due to ctrl.SetLogger single-fulfillment by @kaushikmitr in #2363
- chore(deps): bump sigs.k8s.io/controller-tools from 0.19.0 to 0.20.1 by @dependabot[bot] in #2347
- refactor(bbr): extract cmd-line args to Options struct by @yehuditkerido in #2353
- make kaushikmitr latencypredictor owner by @kaushikmitr in #2373
- Fix typo in the letter l should be in capital case by @szedan-rh in #2380
- Moved data layer metrics compatibility test to keep framework imports self contained by @elevran in #2381
- Build and distribute latencyprediction images by @Gregory-Pereira in #2287
- Make latency prediction server multi threaded compatable by @kaushikmitr in #2349
- docs: Fix typos in config-text.md by @terrytangyuan in #2387
- fix: stable epp version for conformance test in main by @zetxqx in #2386
- fix: Fix toggle for cpu/gpu/sim model server deployment by @terrytangyuan in #2388
- feat: improve filter plugin logging by @terrytangyuan in #2389
- Model replacement to Qwen3-32B by @sats-23 in #2189
- Refactor EPP initialization for unified integration testing by @zetxqx in #2351
- fix: reuse scheduling predictions for TTFT and first TPOT reporting by @kaushikmitr in #2372
- Update Grafana setup directions by @RyanRosario in #2335
- Emit metrics on execution time of BBR plugins by @asaadbalum in #2379
- Update vllm version in gpu deployment and minor doc fixes by @rahulgurnani in #2375
- LatencyPredicton in Disagg Mode should handle filtered predictions by @RishabhSaini in #2390
- chore(deps): bump google.golang.org/grpc from 1.78.0 to 1.79.1 by @dependabot[bot] in #2403
- chore(deps): bump sigs.k8s.io/kustomize/api from 0.21.0 to 0.21.1 by @dependabot[bot] in #2399
- chore(deps): bump github.com/elastic/crd-ref-docs from 0.2.0 to 0.3.0 by @dependabot[bot] in #2402
- fix: Add ALPN h2 support to gRPC TLS configuration by @johnahull in #2385
- chore: metrics vars private instead of public by @nirrozenbaum in #2396
- Datalayer: Validate data plugins execution order across different layers by @rahulgurnani in #2333
- Add logs (at log level 1) for better observability by @rahulgurnani in #2384
- Enable golangci-lint v2.9.0 and fix lint errors by @elevran in #2416
- Update InferencePool helm chart to use FailOpen as default by @RyanRosario in #2365
- Refactor datalayer interfaces, separating polling and notification based types by @elevran in #2407
- Refactor response path: accept raw bytes and consolidate ResponseComplete logic by @zetxqx in #2410
- add objective type to latency prediction go client by @kaushikmitr in #2423
- create predictedLatencyCtx during scoring if not created in prepareddata by @kaushikmitr in #2408
- Refactor request path for pluggable parser by @zetxqx in #2409
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.28.0 to 2.28.1 by @dependabot[bot] in #2400
- Guide fixes as a followup to the qwen move by @ahg-g in #2431
- Revert "[Conformance] copy pkgs from gateway-api to enable upgrade to gateway-api v1.4.0 (#2159)" by @ericdbishop in #2331
- fix(predicted-latency-scorer): prevent nil pointer panic for non-completions API types by @noalimoy in #2415
- feat(bbr): build RequestContext to store headers for plugins by @noalimoy in #2368
- [pluggable bbr] Configurable body fields to headers BBR plugin by @davidbreitgand in #2417
- Makes codebase pflag consistent by @carmal891 in #2443
- Add BBR metrics to Inference Gateway Grafana dashboard by @asaadbalum in #2397
- First part of refactoring the ExtProc used by both the EPP and the BBR by @shmuelk in #2428
- support active port declaration on pod annotation by @delavet in #2256
- Fix: Downgrade NTPOT metrics error to verbose info to reduce noise by @ycjiang50 in #2435
- add the option to register custom metrics by @nirrozenbaum in #2445
- sidecar: add request coalescing for bulk predictions by @kaushikmitr in #2425
- chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.39.0 to 1.41.0 by @dependabot[bot] in #2460
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #2456
- chore(deps): bump github.com/envoyproxy/go-control-plane/envoy from 1.36.0 to 1.37.0 by @dependabot[bot] in #2457
- chore(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc from 1.40.0 to 1.41.0 by @dependabot[bot] in #2459
- Epp resource defaults by @kaushikmitr in #2455
- Add hermetic test for epp with inferencepool by @capri-xiyue in #2277
- chore(Conformance): update dependabot.yml for conformance sub module by @zetxqx in #2462
New Contributors
- @tsj-30 made their first contribution in #1989
- @jpekmez made their first contribution in #2112
- @wseaton made their first contribution in #2103
- @aishukamal made their first contribution in #2126
- @fcfort made their first contribution in #2156
- @evacchi made their first contribution in #2097
- @bjee19 made their first contribution in #2236
- @bongwoobak made their first contribution in #2161
- @ericdbishop made their first contribution in #2235
- @rumstead made their first contribution in #2259
- @aniketmohanty82 made their first contribution in #2245
- @varad-ahirwadkar made their first contribution in #2280
- @yehuditkerido made their first contribution in #2353
- @szedan-rh made their first contribution in #2380
- @asaadbalum made their first contribution in #2379
- @johnahull made their first contribution in #2385
- @carmal891 made their first contribution in #2443
- @ycjiang50 made their first contribution in #2435
Full Changelog: v1.3.1...v1.4.0-rc.1