Skip to content

v1.4.0-rc.1

Pre-release
Pre-release

Choose a tag to compare

@danehans danehans released this 05 Mar 22:39
· 266 commits to main since this release
v1.4.0-rc.1
8f057d7

RC Highlights

  • v1.4.0-rc.1 is available for community testing before the final v1.4.0 release
  • standalone chart work landed and is included in release artifacts
  • conformance was split into its own Go module
  • InferencePool / Helm / gRPC-related improvements landed, including appProtocol, FailOpen, and ALPN h2
  • significant ongoing work landed in flow control, BBR, predicted latency, and datalayer internals

What's Changed

  • cleanup: resolve technical debt and link tracking issues by @LukeAVanDrie in #2083
  • Removing dead code that throws an err when no match is found by @kfswain in #2088
  • cleanup: rename integration test utilities to remove _test suffix by @LukeAVanDrie in #2084
  • Fixed targetPorts copy error by @capri-xiyue in #2092
  • Add PR write permissions to label checker GHA, as it cannot add label… by @kfswain in #2094
  • clean unused interface by @nirrozenbaum in #2098
  • prefill aware prefix plugin by @ahg-g in #2104
  • Removing perm-restricted GHA by @kfswain in #2105
  • Updating vllm versions and fixing git commit sign by @kfswain in #2108
  • Standardize inferencepool Helm templates and drop unnecessary tpl by @tsj-30 in #1989
  • feat(bbr): add configuration flags for metrics auth and secure serving by @jpekmez in #2112
  • chore(deps): bump github.com/prometheus/prometheus from 0.308.1 to 0.309.0 by @dependabot[bot] in #2090
  • fix both error propogation and priority band fullness by @wseaton in #2103
  • Datalayer refactoring: HTTP datasource and client by @irar2 in #2120
  • Add v1 conformance report for alibabacloud ack gateway by @delavet in #2007
  • changed httproute creation to be behind a flag. by @nirrozenbaum in #2118
  • Rename part two by @shmuelk in #1968
  • rename of experimental http route creation section in helm by @nirrozenbaum in #2123
  • add scoring preference to scorer interface. by @nirrozenbaum in #2119
  • feat: make epp-standalone be its own chart by @capri-xiyue in #2122
  • fix: [Flow Control]: Optionally disable endpoint subset filtering while dispatching requests by @aishukamal in #2126
  • fix: add update helm dependency by @zetxqx in #2135
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.3 to 2.27.5 by @dependabot[bot] in #2138
  • chore(deps): bump github.com/prometheus/prometheus from 0.309.0 to 0.309.1 by @dependabot[bot] in #2136
  • chore(deps): bump github.com/onsi/gomega from 1.38.3 to 1.39.0 by @dependabot[bot] in #2137
  • Rename part three by @shmuelk in #2124
  • fixed latest guide to use httproute creation in via the helm chart by @nirrozenbaum in #2141
  • Removed duplicated field in log message by @shmuelk in #2142
  • Update the metrics used by the dashboard by @learner0810 in #2139
  • registry: switch to fine-grained leasing for flow lifecycle by @LukeAVanDrie in #2127
  • Increase default FlowGCTimeout to 1h to prevent premature GC by @LukeAVanDrie in #2143
  • update bbr quickstart guide with latest functionality by @nirrozenbaum in #2150
  • Separate conformance tests modules from main tests by @rikatz in #1994
  • feat: Add concurrency saturation detector by @LukeAVanDrie in #2062
  • feat: epp standalone helm chart included in release to docker by @capri-xiyue in #2148
  • Fix indention error for latency predictor by @liu-cong in #2158
  • Removing alpha status in GH landing page by @kfswain in #2132
  • docs: added epp standalone user guide by @capri-xiyue in #2147
  • Add tracing entry span with W3C propagation to EPP handler by @sallyom in #2057
  • feat(docs): enable content tab linking in mkdocs by @AvineshTripathi in #2176
  • update bbr label filtering to align with best practices by @nirrozenbaum in #2178
  • updated kgateway section in bbr quickstart guide by @nirrozenbaum in #2179
  • move logging util to common pkg by @nirrozenbaum in #2180
  • Interfaces towards pluggable BBR framework (initial PR) by @davidbreitgand in #2121
  • feat(api): Add appProtocol to InferencePool API for gRPC support by @zetxqx in #2162
  • docs: reference right manifest file by @sats-23 in #2186
  • test: add hermetic coverage for standalone mode by @LukeAVanDrie in #2175
  • Add support for video/audio formats for multimodal inputs by @rahulgurnani in #2181
  • fix identation bug in quickstart by @nirrozenbaum in #2182
  • refactor(flowcontrol): Migrate Fairness Policies to EPP Plugin System by @LukeAVanDrie in #2031
  • [Conformance] copy pkgs from gateway-api to enable upgrade to gateway-api v1.4.0 by @zetxqx in #2159
  • controller: extend flow lease scope to fix orphaned queues #1982 by @LukeAVanDrie in #2131
  • rename slo-aware-router to predicted-latency by @kaushikmitr in #2183
  • Better encapsulate data layer set up and validation. by @elevran in #2185
  • test: added latency predictor converage for inferencepool and added convera… by @capri-xiyue in #2187
  • cleanup: refactor multiple include into one file by @capri-xiyue in #2191
  • feat: Allow request control plugins to return ext_proc dynamic metadata by @fcfort in #2156
  • Moving the scheduling component pluggable interface and types to the common framework pkg by @ahg-g in #2192
  • Update troubleshooting guide to include remediation for incorrect pre… by @BenjaminBraunDev in #2040
  • Add flowcontrol queue length in bytes metric by @RyanRosario in #2044
  • Moved the epp/plugins pkg to be under the new framework pkg by @ahg-g in #2194
  • Move framework interfaces under epp/framework/interface by @ahg-g in #2195
  • feat: added a local mode in verify helm script by @capri-xiyue in #2196
  • [Flow Control] Garbage Collection for Priority Bands by @evacchi in #2097
  • Moving requestcontrol pluggable interfaces/types to epp/framework/interface by @ahg-g in #2197
  • Flow Control: Ordering Policy Migration (Phase 1) by @LukeAVanDrie in #2188
  • docs: reorganize EPP configuration guide structure by @cr7258 in #1962
  • fix: fixed epp standalone helm and updated verify helm script by @capri-xiyue in #2204
  • Fix kgateway document link by @learner0810 in #2140
  • change scorer weight type from int to float by @nirrozenbaum in #2207
  • Flow Control: Ordering Policy Migration (Phase 2) by @LukeAVanDrie in #2193
  • cleanup of setupLog from function calls in runner by @nirrozenbaum in #2206
  • fix(registry): fix JIT error scoping race by @LukeAVanDrie in #2199
  • refactor(flowcontrol): migrate interfaces to framework/interface/flowcontrol by @LukeAVanDrie in #2208
  • Move requestcontrol interface dependencies to epp/framework/interface by @ahg-g in #2213
  • Update prefix scorer to report cached prefix length in tokens by @mayabar in #2053
  • Add SGlang example/docs by @rahulgurnani in #2002
  • refactor: rename interflow package to fairness by @LukeAVanDrie in #2214
  • Extracted all scheduling execution logic out of framework/interface/scheduling pkg by @ahg-g in #2223
  • refactor: rename intraflow package to ordering by @LukeAVanDrie in #2215
  • feat: switch saturation detection to gradient signal by @LukeAVanDrie in #2224
  • fix: removed duplicate include by @capri-xiyue in #2225
  • test file for datastore.go by @tsj-30 in #2200
  • cleanup(flowcontrol): Refactor registry with generic leasing and atomic GC by @LukeAVanDrie in #2198
  • chore: bump sim version by @nirrozenbaum in #2227
  • Move datalayer plugin intefaces and associated types to framework/interface/datalayer by @ahg-g in #2228
  • feat: add FC configuration to EndpointPickerConfig by @LukeAVanDrie in #2217
  • refactor: move flow control types to framework pkg by @LukeAVanDrie in #2221
  • Moving the scheduling plugins under epp/framework/plugins by @ahg-g in #2230
  • Fix/ feature for issue 2028. Add support for responses api and conversations api by @srampal in #2133
  • Moved request control plugins to epp/framework/plugins by @ahg-g in #2231
  • feat: wire flow control policies and smart queue defaults by @LukeAVanDrie in #2232
  • Add NGF 2.4 conformance report by @bjee19 in #2236
  • documentation: add guide for trace support by @JeffLuoo in #2212
  • feat: New cost reporting plugin for returning cost in ext_proc dynamic metadata response to proxy by @fcfort in #2114
  • feat(metrics): support engine-aware metric collection by @bongwoobak in #2161
  • Add pod_type categorical feature to latency prediction models by @RishabhSaini in #1993
  • test: fix flakes in controller and registry tests by @LukeAVanDrie in #2250
  • chore(deps): bump go.opentelemetry.io/otel/trace from 1.39.0 to 1.40.0 by @dependabot[bot] in #2254
  • Add v1.0.1 conformance report for Istio by @ericdbishop in #2235
  • feat: added epp standalone label to track usage by @capri-xiyue in #2252
  • docs: fix epp README.md link for metrics by @rumstead in #2259
  • docs: Removed speculative statement about creating a new Slack channel by @terrytangyuan in #2272
  • Implement garbage collection for SLO context store by @aniketmohanty82 in #2245
  • Change compilation-config flag to valid default parameter by @BenjaminBraunDev in #2244
  • feat: added guide for epp standalone with inferencepool dependency by @capri-xiyue in #2260
  • support for setting up epp deployment resources by @learner0810 in #2273
  • SGlang: Fixes to the getting started guide by @rahulgurnani in #2279
  • add priority to admit request by @kaushikmitr in #2251
  • Initial move of datalayer plugins to epp/framework/plugins by @ahg-g in #2286
  • Moved metrics keys to the datalayer framework pkg by @ahg-g in #2288
  • align attritbute reporter with plugins structure by @nirrozenbaum in #2290
  • move logging under observability dir by @nirrozenbaum in #2292
  • extracted InitLogging to common observability logging pkg by @nirrozenbaum in #2296
  • refactor: rename epp standalone to standalone by @capri-xiyue in #2298
  • Validate dependencies for all producer/consumer plugins in datalayer by @rahulgurnani in #2246
  • refactor: rename to epplib by @capri-xiyue in #2299
  • chore(deps): bump sigs.k8s.io/kustomize/kyaml from 0.21.0 to 0.21.1 by @dependabot[bot] in #2303
  • Check scheme in the HTTP datasource creation by @irar2 in #2305
  • chore(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc from 1.39.0 to 1.40.0 by @dependabot[bot] in #2304
  • Move the metrics extractor to its own pkg under plugins/datalayer/extractor by @ahg-g in #2291
  • extracted tracing/profiling/metrics common to observability by @nirrozenbaum in #2307
  • SGLang docs: Revert changes to the released guide and apply them to the latest guide by @rahulgurnani in #2302
  • fix(datastore): remove orphaned rank endpoints when targetPorts changes by @bongwoobak in #2308
  • chore: remove deprecated top-level istio values by @varad-ahirwadkar in #2280
  • add PopulateControllerConfig in runner.go by @kaushikmitr in #2300
  • clean up: remove model server type by @capri-xiyue in #2311
  • Revert "clean up: remove model server type" by @capri-xiyue in #2312
  • conformance: better polling times by @howardjohn in #2313
  • chore: fix typo by @hhk7734 in #2322
  • docs: fixed standalone docs by @capri-xiyue in #2314
  • Config options fix by @shmuelk in #2321
  • chore: Separate Gateway API main module and conformance module dependencies by @ericdbishop in #2285
  • conformance: modify warmup from weighted two-pools test by @danehans in #2316
  • Add dispatch cycle duration metric for flow control by @RyanRosario in #2110
  • Add unit tests to validate Datalayer metrics source and extractor produce the same metrics as backend.PodMetrics by @elevran in #2323
  • Refactor: Centralize EndpointPool initialization for Standalone mode by @varad-ahirwadkar in #2229
  • Add flow control request enqueue metric by @RyanRosario in #2153
  • move latency prediction to prepare data step and add admit request plugin hook by @kaushikmitr in #2319
  • [Feature] Add k8s notification data sources by @elevran in #2320
  • Script to validate framework imports are confined by @elevran in #2337
  • Have a default configuration for the EndpointPicker by @shmuelk in #2341
  • A transitional PR towards pluggable BBR Framework by @davidbreitgand in #2209
  • chore(deps): bump github.com/onsi/gomega from 1.39.0 to 1.39.1 by @dependabot[bot] in #2346
  • chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #2344
  • chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.1 to 6.3.2 by @dependabot[bot] in #2345
  • chore(deps): bump sigs.k8s.io/controller-runtime from 0.22.4 to 0.23.1 by @dependabot[bot] in #2216
  • Enable make verify-fw-imports by @elevran in #2358
  • more work on pluggable bbr and alignment with epp structure and naming by @nirrozenbaum in #2348
  • Move datasource/extractor validations to datalayer runtime. by @elevran in #2357
  • Make PredictedLatencyScorer PD Aware by @RishabhSaini in #2361
  • fix: InitLogging verbosity ignored due to ctrl.SetLogger single-fulfillment by @kaushikmitr in #2363
  • chore(deps): bump sigs.k8s.io/controller-tools from 0.19.0 to 0.20.1 by @dependabot[bot] in #2347
  • refactor(bbr): extract cmd-line args to Options struct by @yehuditkerido in #2353
  • make kaushikmitr latencypredictor owner by @kaushikmitr in #2373
  • Fix typo in the letter l should be in capital case by @szedan-rh in #2380
  • Moved data layer metrics compatibility test to keep framework imports self contained by @elevran in #2381
  • Build and distribute latencyprediction images by @Gregory-Pereira in #2287
  • Make latency prediction server multi threaded compatable by @kaushikmitr in #2349
  • docs: Fix typos in config-text.md by @terrytangyuan in #2387
  • fix: stable epp version for conformance test in main by @zetxqx in #2386
  • fix: Fix toggle for cpu/gpu/sim model server deployment by @terrytangyuan in #2388
  • feat: improve filter plugin logging by @terrytangyuan in #2389
  • Model replacement to Qwen3-32B by @sats-23 in #2189
  • Refactor EPP initialization for unified integration testing by @zetxqx in #2351
  • fix: reuse scheduling predictions for TTFT and first TPOT reporting by @kaushikmitr in #2372
  • Update Grafana setup directions by @RyanRosario in #2335
  • Emit metrics on execution time of BBR plugins by @asaadbalum in #2379
  • Update vllm version in gpu deployment and minor doc fixes by @rahulgurnani in #2375
  • LatencyPredicton in Disagg Mode should handle filtered predictions by @RishabhSaini in #2390
  • chore(deps): bump google.golang.org/grpc from 1.78.0 to 1.79.1 by @dependabot[bot] in #2403
  • chore(deps): bump sigs.k8s.io/kustomize/api from 0.21.0 to 0.21.1 by @dependabot[bot] in #2399
  • chore(deps): bump github.com/elastic/crd-ref-docs from 0.2.0 to 0.3.0 by @dependabot[bot] in #2402
  • fix: Add ALPN h2 support to gRPC TLS configuration by @johnahull in #2385
  • chore: metrics vars private instead of public by @nirrozenbaum in #2396
  • Datalayer: Validate data plugins execution order across different layers by @rahulgurnani in #2333
  • Add logs (at log level 1) for better observability by @rahulgurnani in #2384
  • Enable golangci-lint v2.9.0 and fix lint errors by @elevran in #2416
  • Update InferencePool helm chart to use FailOpen as default by @RyanRosario in #2365
  • Refactor datalayer interfaces, separating polling and notification based types by @elevran in #2407
  • Refactor response path: accept raw bytes and consolidate ResponseComplete logic by @zetxqx in #2410
  • add objective type to latency prediction go client by @kaushikmitr in #2423
  • create predictedLatencyCtx during scoring if not created in prepareddata by @kaushikmitr in #2408
  • Refactor request path for pluggable parser by @zetxqx in #2409
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.28.0 to 2.28.1 by @dependabot[bot] in #2400
  • Guide fixes as a followup to the qwen move by @ahg-g in #2431
  • Revert "[Conformance] copy pkgs from gateway-api to enable upgrade to gateway-api v1.4.0 (#2159)" by @ericdbishop in #2331
  • fix(predicted-latency-scorer): prevent nil pointer panic for non-completions API types by @noalimoy in #2415
  • feat(bbr): build RequestContext to store headers for plugins by @noalimoy in #2368
  • [pluggable bbr] Configurable body fields to headers BBR plugin by @davidbreitgand in #2417
  • Makes codebase pflag consistent by @carmal891 in #2443
  • Add BBR metrics to Inference Gateway Grafana dashboard by @asaadbalum in #2397
  • First part of refactoring the ExtProc used by both the EPP and the BBR by @shmuelk in #2428
  • support active port declaration on pod annotation by @delavet in #2256
  • Fix: Downgrade NTPOT metrics error to verbose info to reduce noise by @ycjiang50 in #2435
  • add the option to register custom metrics by @nirrozenbaum in #2445
  • sidecar: add request coalescing for bulk predictions by @kaushikmitr in #2425
  • chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.39.0 to 1.41.0 by @dependabot[bot] in #2460
  • chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #2456
  • chore(deps): bump github.com/envoyproxy/go-control-plane/envoy from 1.36.0 to 1.37.0 by @dependabot[bot] in #2457
  • chore(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc from 1.40.0 to 1.41.0 by @dependabot[bot] in #2459
  • Epp resource defaults by @kaushikmitr in #2455
  • Add hermetic test for epp with inferencepool by @capri-xiyue in #2277
  • chore(Conformance): update dependabot.yml for conformance sub module by @zetxqx in #2462

New Contributors

Full Changelog: v1.3.1...v1.4.0-rc.1