What's Changed
🚀 Major Features:
- Initialize documentation site by @rudeigerc in #388
- feat: add TensorRT-LLM as backend by @cr7258 in #392
- RFC: Gateway Metric Aggregator by @kerthcet in #404
- Proposal for karpenter intergation by @carlory in #439
✨ Features:
- feat: add preStop hook for llamacpp and tgi in the BackendRuntime by @cr7258 in #381
- feat: support speculative decoding for llamacpp by @cr7258 in #402
- Add global configmap by @kerthcet in #431
- Add dispatcher & memoryStore & latencyAwarePlugin by @kerthcet in #440
- feat: support runai streamer for vllm by @cr7258 in #423
🐛 Bugs:
- feat: update sglang version to v0.4.5 to fix /health_generate endpoint 404 error by @cr7258 in #383
- fix: remove trailing slashes from envoyproxy repository URLs in Chart.yaml by @OKevinoo in #407
♻️ Cleanups:
- use lws as sub chart by @carlory in #408
- Disable prometheus by default by @kerthcet in #416
- Clarify Kubernetes version requirement and fallback plan in Key Features by @SanjanShiv in #380
- Add ci test with helm chart by @kerthcet in #432
- fix: add ut for backend runtime. by @X1aoZEOuO in #428
- Add inftyai-scheduler support and config updates by @carlory in #447
New Contributors
- @cr7258 made their first contribution in #370
- @kenwoodjw made their first contribution in #372
- @SanjanShiv made their first contribution in #380
- @rudeigerc made their first contribution in #388
- @OKevinoo made their first contribution in #407
- @IRONICBo made their first contribution in #409
- @X1aoZEOuO made their first contribution in #428
Full Changelog: v0.1.3...v0.1.4