Releases: envoyproxy/ai-gateway
v0.6.0
Envoy AI Gateway v0.6.0
Envoy AI Gateway v0.6.0 marks the first production-ready API surface:
- The core CRDs (
AIGatewayRoute,AIServiceBackend,BackendSecurityPolicy,GatewayConfig,MCPRoute) are now served atv1beta1. - AWS Bedrock gains a native
InvokeModelpath for Claude alongside Titan embeddings via the OpenAI/v1/embeddingscontract. - Gemini gets first-class embeddings and Anthropic-style prefix context caching.
- Cross-provider clients can hit Anthropic's
/v1/messagesendpoint on any OpenAI-compatible backend, and a singlereasoning_effortknob now works across Anthropic, OpenAI, and Gemini. - Operators get GKE Workload Identity via Application Default Credentials, configurable webhook host networking, request/response body redaction for compliance, and the Go 1.26.2 + Envoy 1.37 + Envoy Gateway 1.7 baseline.
Two breaking changes land in v0.6 — AIGatewayRoute.spec.filterConfig is removed (move to GatewayConfig), and the deprecated version-as-prefix behavior on VersionedAPISchema is removed (use prefix). See Upgrade Guidance below.
⚠️ Breaking Changes
AIGatewayRoute.spec.filterConfigremoved. ThefilterConfigfield onAIGatewayRoutehas been removed. Move external-processor configuration (resources, env vars, image overrides) to aGatewayConfigresource referenced from theGatewayvia theaigateway.envoyproxy.io/gateway-configannotation. v0.5 deprecated theresourcessubfield with a pointer toGatewayConfig; v0.6 removes the entirefilterConfigstruct, so anything still set there must move now. See the upgrade guidance below.VersionedAPISchema.versionno longer acts as an endpoint prefix for OpenAI-schema backends. The legacy behavior deprecated in v0.5 is gone. Use theprefixfield instead (e.g.prefix: /v1beta/openaifor Gemini's OpenAI-compatible API,prefix: /compatibility/v1for Cohere). See the upgrade guidance below.
✨ New Features
AWS Bedrock
- Native
InvokeModelAPI for Claude — Send requests to Claude models on Bedrock through Bedrock's nativeInvokeModelendpoint, complementing the existing Converse API path. Useful when applications already speak the Anthropic Messages format and want a thin translation layer. - OpenAI → Bedrock Titan embeddings translation — Call Amazon Titan embedding models on Bedrock through the standard OpenAI
/v1/embeddingscontract. Switch embedding providers without changing client code. Cohere and other Bedrock embedding models are not yet covered and will follow in a later release.
Anthropic and Cross-Provider Translation
- Anthropic
/v1/messagesendpoint on OpenAI backends — Expose any OpenAI-compatible backend through Anthropic's Messages API. Lets Claude-style clients reach OpenAI, Azure OpenAI, or any other OpenAI-compatible provider behind the gateway without rewriting requests. - Structured output for Claude models — Pass JSON schema constraints through to Claude so responses conform to your declared shape. Available on Anthropic and AWS Bedrock Claude backends today; GCP Vertex AI Claude is excluded pending upstream provider support.
- Cleaner handling when
max_tokensis omitted on Anthropic requests — Requests without an explicitmax_tokensno longer crash the translator; they're forwarded so the provider returns a normal validation error. Removes a long-standing footgun when forwarding OpenAI-shaped requests through the Anthropic path. - Adaptive thinking for
claude-opus-4.6— Translate Claude's new adaptive thinking mode end-to-end. Adaptive lets the model decide thinking depth per request rather than committing to a fixed budget, so callers can opt in without bespoke provider code. - Unified
reasoning_effortacross Anthropic, OpenAI, and Gemini — A single OpenAI-stylereasoning_effortvalue (low/medium/high/xhigh) now maps onto Anthropic's thinking budgets and Gemini 3's thinking controls. One client knob, three providers.
Gemini Provider
- Gemini embeddings translation — Use Gemini embedding models through the OpenAI
/v1/embeddingscontract, completing Gemini coverage alongside chat completions and Responses. - Gemini context caching with prefix-style API — Activate Gemini's context caching using the same Anthropic-style
cache_controlprefix surface already supported elsewhere. Cut input token costs on long, repeated system prompts without a Gemini-specific code path. - Gemini reasoning surfaced as thinking blocks — Non-streaming Gemini reasoning is now exposed as both string content and structured
thinking_blocks, matching the shape clients already use for Anthropic responses. Streaming responses still surface reasoning as string content only.
OpenAI API Compatibility
- Responses API — context management and richer streaming — Second wave of Responses API work fills in context management and improved streaming so the
/v1/responsespath is closer to parity with/v1/chat/completions. If you held off on/v1/responsesdue to missing features, retest now. - Compatibility with open-source Responses API implementations — Improved compatibility with non-OpenAI implementations of the Responses API (e.g. open-source inference servers that expose a
/v1/responsesendpoint), broadening which Responses-aware clients can sit in front of the gateway. - Text-to-speech endpoint
/v1/audio/speech— Route OpenAI text-to-speech requests through the gateway, so audio workloads benefit from the same auth, rate limiting, and observability as chat traffic.
MCP Gateway
- Per-backend header forwarding with rename —
MCPRouteBackendRef.forwardHeadersaccepts a list of inbound headers to forward to each backend, optionally renaming them on the way out. Each MCP backend can receive its own set of headers (e.g. trace context, tenant identifiers, per-user auth) without a single route-wide rule. - JWT claim forwarding to MCP backends — Project verified JWT claims into outbound headers via
MCPRouteOAuth.claimToHeaders, enabling identity-aware tool execution at backend MCP servers without re-authenticating downstream. - Exclude /
excludeRegexon tool selectors —MCPToolFilternow supports deny patterns (literalexcludeand regexexcludeRegex) alongside the existing include rules. Useful when a backend exposes more capabilities than a given route should surface. - Tool name in access logs and response metadata — Tool invocations now carry the tool name in dynamic metadata (key
mcp_tool_name), so per-tool debugging, dashboards, and access-log fields are straightforward to wire up. - Per-backend capability tracking — The gateway tracks which MCP server feature flags (
tools,prompts,resources,logging,completions) each backend supports and merges them across a route. Capability negotiation now reflects what's actually reachable, so clients don't get told a feature is available when no reachable backend implements it.
Authentication and Identity
- GKE Workload Identity via Application Default Credentials — GCP backends now authenticate using the standard ADC chain when neither
credentialsFilenorworkloadIdentityFederationConfigis set in theBackendSecurityPolicy. Workloads running on GKE pick up Workload Identity automatically — no static service account JSON secret needed.
Security and Privacy
- Request and response body redaction — Strip or mask sensitive fields in request and response bodies before they hit logs, traces, or metrics. Lets you keep observability on while meeting privacy and compliance constraints.
Observability
- OTLP access logging auto-configured by
aigw— Standaloneaigwwires up OTLP access logging out of the box when an OTLP endpoint is configured (viaOTEL_EXPORTER_OTLP_ENDPOINT), removing a manual step from local-dev and demo paths. - Default
agent-session-id→session.idheader mapping — Spans and logs now correlate bysession.idautomatically when clients send theagent-session-idheader, so agent frameworks like Goose get session correlation with zero config. Override or disable viaOTEL_AIGW_REQUEST_HEADER_ATTRIBUTES. Metrics never default to session IDs (high cardinality). ReasoningTokencost type —LLMRequestCostTypenow includesReasoningToken, so you can budget and bill against thinking tokens separately from input, output, and cache cost types.- Response model metadata — Responses now carry the resolved upstream model in metadata, which clients and downstream tools can read to confirm exactly which model served a request (useful when routes use model aliasing or fallback).
- OTEL attribute count cap removed for large contexts — Removed the OTEL span attribute count limit so long-context requests no longer have parts of their trace silently dropped.
Operations and Extensibility
- Custom webhook port and host network — The conversion webhook can now bind to a configurable port (
controller.mutatingWebhook.port) and run on the host network (controller.hostNetwork), smoothing installs in clusters with restrictive admission webhook networking such as GKE private clusters. - Lua filter slot after the AI ExtProc stage — Lua filters can now be attached after the AI ExtProc stage in the standard filter chain, so you can do last-mile request shaping (header rewrites, body tweaks) without writing a custom
EnvoyExtensionPolicy. - Route-scoped LLM request costs with global defaults — Set
GatewayConfig.spec.globalLLMRequestCostsfor fleet-wide defaults and override per-route atAIGatewayRoute.spec.llmRequestCosts. Makes per-tenant or per-backend cost tracking straightforward without per-route boilerplate.
🔗 API Updates
- **Core CRDs promoted to `aigateway.en...
v0.6.0-rc1
Release candidate
v0.5.0
Envoy AI Gateway v0.5.0
Multi-gateway configuration, prompt caching cost savings, fine-grained MCP authorization, OpenAI Responses API, and Google Search grounding for Gemini.
Envoy AI Gateway v0.5.0 makes multi-gateway deployments easier with the new GatewayConfig CRD, cuts costs with prompt caching for AWS Bedrock and GCP Claude, and unlocks fine-grained access control with CEL-based MCP authorization. Developers gain OpenAI Responses API support, Google Search grounding for Gemini, and the ability to mutate request bodies per-route. Under the hood, the switch to sonic JSON processing reduces latency across all requests.
✨ New Features
Gateway Configuration
- New
GatewayConfigCRD — Gateway-scoped configuration via a new custom resource. Reference it from a Gateway via theaigateway.envoyproxy.io/gateway-configannotation to configure the external processor container (env vars, resource requirements, container settings). Multiple Gateways can share the same GatewayConfig. - Configurable endpoint prefixes — New
prefixfield onVersionedAPISchemafor backends with non-standard OpenAI-compatible prefixes (e.g., Gemini's/v1beta/openai, Cohere's/compatibility/v1).
OpenAI API Support
- OpenAI Responses API (
/v1/responses) — Full support with streaming and non-streaming modes, function calling, MCP tools, reasoning, multi-turn conversations, multimodal capabilities, token usage tracking, and OpenInference tracing.
Provider Caching Enhancements
- Prompt caching for AWS Bedrock Claude — Reuse cached system prompts with Bedrock Anthropic models. Cache point markers are handled automatically with separate tracking for cache creation and cache hit tokens.
- Prompt caching for GCP Vertex AI Claude — Same cost-saving prompt caching for Claude models on GCP Vertex AI for system prompts and few-shot examples.
MCP Gateway Enhancements
- Fine-grained authorization with CEL, JWT claims, and external auth — Write expressive CEL rules using request attributes (HTTP method, headers, JWT claims, tool names, call arguments), enforce access based on JWT claim values, or delegate to external gRPC/HTTP authorization services.
- Real-time tool list synchronization — MCP clients automatically receive
notifications/tools/list_changedwhen MCPRoutes update, refreshing available tools without reconnection. - Stdio server proxy in standalone mode — Run command-line MCP tools (e.g.,
npx-based servers) without code changes via theaigwCLI HTTP proxy. - Improved OAuth metadata discovery — Well-known endpoints now serve at the MCPRoute path prefix for correct authorization discovery across multiple routes.
Inference Extension
- Security policies for inference pools — Apply
BackendSecurityPolicytoInferencePoolresources for consistent authentication across dynamically-selected inference endpoints.
Gemini Provider Enhancements
- Google Search grounding — Give Gemini models access to real-time web information via the
google_searchtool type with domain filtering, blocking confidence thresholds, and time range restrictions. - Consistent thinking configuration across providers — Same
thinkingconfiguration works for both Anthropic and Gemini models for provider-agnostic reasoning features. - Gemini 3 reasoning and image quality controls —
thinking_level(reasoning depth) andmedia_resolution(image quality vs. speed) with graceful degradation on older Gemini versions. - Visibility into model reasoning — Thought summaries extracted and surfaced from Gemini responses when thinking is enabled.
- Enterprise web search integration —
enterprise_searchtool type for grounding responses in organization-specific search infrastructure and data sources.
Traffic Management
- Route-level body mutation — Inject or remove JSON fields in request bodies per-backend using
bodyMutationwithsetandremoveoperations. Route-level settings override backend defaults. - AWS Bedrock service tier control — Choose between standard, flex, priority, and reserved tiers for latency-sensitive or cost-optimized workloads with automatic fallback handling.
Observability Enhancements
- Per-provider cost attribution — New
gen_ai.provider.namemetric attribute for filtering dashboards and alerts by provider. - Full tracing for Anthropic Messages API — OpenInference-compliant tracing for the native
/messagesendpoint, compatible with Arize Phoenix and OpenTelemetry platforms. - Cohere Rerank visibility — Full OpenTelemetry support for Cohere's v2 rerank endpoint capturing query, documents, and relevance scores.
Performance and Operations
- Faster request processing with sonic JSON — Migrated to bytedance/sonic for JSON encoding/decoding with measurable latency improvements and lower CPU usage.
- Faster cross-namespace reference validation — Optimized ReferenceGrant indexing reduces controller reconciliation time.
- Improved MCP proxy throughput — HTTP connection reuse across MCP proxy requests eliminates per-request connection overhead. Details →
🔗 API Updates
- New
GatewayConfigCRD — Gateway-level configuration withextProc.kubernetesfor container settings. Reference viaaigateway.envoyproxy.io/gateway-configannotation. VersionedAPISchema.prefix— Newprefixfield replaces overloadingversionfor endpoint path customization.AIGatewayRouteRuleBackendRef.bodyMutation— New field withset(field/value pairs) andremove(field names) for request body manipulation.LLMRequestCostType.CacheCreationInputToken— New cost type for tokens written to cache, separate fromCachedInputToken.MCPRouteSecurityPolicyauthorization fields — Newauthorizationblock withdefaultAction,rulesarray (CEL, JWT scopes/claims, tools targeting), andextAuthfor external authorization.BackendSecurityPolicy.targetRefsexpansion — Now acceptsInferencePool(inference.networking.x-k8s.io) in addition toAIServiceBackend.
Deprecations
AIGatewayFilterConfigExternalProcessor.resources— Deprecated. UseGatewayConfiginstead. Will be removed in v0.6.versionfield as prefix for OpenAI schema — Deprecated. Use the newprefixfield. Legacy behavior will be removed in v0.6.
🐛 Bug Fixes
- AWS Bedrock Claude streaming reliability — Streaming responses from Bedrock Claude models now complete correctly without truncation.
- Gemini streaming token counts — Token usage in Gemini streaming responses now matches OpenAI format.
- Multi-chunk Gemini tool calls — Tool calls spanning multiple streaming chunks now have correct indices.
- GCP Claude reasoning content — Reasoning/thinking content correctly passes through for Claude on GCP Vertex AI.
- Zero-weight backend references — Backend references with zero weight no longer cause routing errors.
- Umbrella chart image pull secrets — Helm deployments within umbrella charts correctly inherit
global.imagePullSecrets. - GCP global region backends — Vertex AI backends with global region now work correctly.
- Accurate per-token latency metrics — Fixed integer truncation in
time_per_output_tokencalculation. - Anthropic token counting — Improved accuracy of input and output token counts for Anthropic models.
📖 Upgrade Guidance
Migrating to GatewayConfig
If you're using AIGatewayFilterConfigExternalProcessor.resources, migrate to the new GatewayConfig CRD:
- Create a
GatewayConfigresource:
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: GatewayConfig
metadata:
name: my-gateway-config
namespace: default
spec:
extProc:
kubernetes:
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector:4317"- Reference from your Gateway:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: ai-gateway
annotations:
aigateway.envoyproxy.io/gateway-config: my-gateway-configMigrating Endpoint Prefix Configuration
Before:
schema:
name: OpenAI
version: "/v1beta/openai" # DeprecatedAfter:
schema:
name: OpenAI
prefix: "/v1beta/openai"📦 Dependencies
| Dependency | Version |
|---|---|
| Go | 1.25.6 |
| Envoy Gateway | v1.6 |
| Envoy Proxy | v1.36.4 |
| Gateway API | v1.4.0 |
| Gateway API Inference Extension | v1.0.2 |
🙏 Acknowledgements
Special thanks to the growing community of adopters including Bloomberg, LY Corporation, Alan by Comma Soft, and NRP for their production insights, everyone who reported bugs, submitted PRs, and participated in design discussions, and the Envoy Gateway team for continued collaboration.
🔮 What's Next
- Additional provider integrations (AWS Bedrock InvokeModel, Gemini embeddings, Azure/AKS workload identity)
- Batch inference APIs for high-volume workloads
- Advanced caching strategies with prompt cache key and retention controls
- Upstream provider quota policies
- Sensitive data redaction for request and response bodies
v0.5.0-rc1
Release candidate for v0.5.0!
helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.5.0-rc1 --namespace envoy-ai-gateway-system --create-namespace
v0.4.0
Envoy AI Gateway v0.4.0 - November 07, 2025
Release introducing Model Context Protocol (MCP) Gateway, OpenAI Image Generation, Anthropic support (direct and AWS Bedrock), guided output decoding for GCP Vertex AI/Gemini, cross-namespace references, enhanced authentication, and comprehensive observability improvements.
🔗 View release notes on the site
✨ New Features
Model Context Protocol (MCP) Gateway
New MCPRoute CRD
Introduces
MCPRoutecustom resource for routing MCP requests to backend MCP servers, enabling unified AI API for multiple MCP backends.
Complete MCP spec implementation
Includes streamable HTTP transport, JSON-RPC 2.0 support, and MCP spec-compliant OAuth 2.0 authorization with JWKS validation and Protected Resource Metadata.
Server multiplexing and tool routing
Aggregates multiple MCP servers behind a single endpoint with intelligent tool routing, tool filtering (exact match and regex patterns), and collision detection.
Upstream authentication
Supports both OAuth-based authentication and API key authentication for secure backend MCP server communication with configurable headers.
Session management
Implements MCP session handling with encryption, rotatable seeds, and graceful session lifecycle management.
Anthropic Provider Support
Direct api.anthropic.com support
Native integration with Anthropic's API at
api.anthropic.com, complementing existing GCP Vertex AI Anthropic support.
AWS Bedrock native Anthropic Messages API
Support for Claude models on AWS Bedrock using the native Anthropic Messages API format instead of the generic Converse API, enabling full feature parity with direct Anthropic API including prompt caching and extended thinking.
Anthropic API key authentication
Native
x-api-keyheader-based authentication matching Anthropic's API conventions and SDK patterns for direct Anthropic connections.
Passthrough translator with token usage tracking
Efficient passthrough translation layer that captures token usage and maintains API compatibility while minimizing overhead for both direct and AWS Bedrock Anthropic endpoints.
Standalone CLI auto-configuration
Auto-configuration from
ANTHROPIC_API_KEYenvironment variable in standalone mode for zero-config deployments.
Guided Output Support for GCP Vertex AI/Gemini
Guided regex support
Constrains model outputs to match specific regular expressions for GCP Vertex AI/Gemini models, enabling structured text generation.
Guided choice support
Restricts model outputs to predefined choices for GCP Vertex AI/Gemini models, ensuring responses conform to expected values.
Guided JSON support
Ensures model outputs are valid JSON conforming to specified schemas for GCP Vertex AI/Gemini models, with OpenAI-compatible API translation.
Provider-Specific Enhancements
OpenAI Image Generation /v1/images/generations endpoint
End-to-end support for OpenAI's image generation API including request/response translation, Brotli encoding/decoding, and full protocol compatibility.
OpenAI legacy /v1/completions endpoint
Full pass-through support for OpenAI's legacy completions endpoint with complete tracing and metrics, ensuring backward compatibility.
Azure OpenAI embeddings support
Native support for Azure OpenAI embeddings API with proper protocol translation and token usage tracking.
AWS Bedrock reasoning tokens
Full support for reasoning/thinking tokens in AWS Bedrock responses for both streaming and non-streaming modes, properly exposing extended thinking processes in Claude models.
GCP Vertex AI safety settings
Support for GCP-specific safety settings configuration, allowing fine-grained control over content filtering and safety thresholds for Gemini models.
GCP Gemini streaming token accounting
Accurate completion_tokens reporting in streaming usage chunks for Gemini models, ensuring proper token accounting during streaming responses.
Cross-Namespace Resource References
Cross-namespace AIServiceBackend references
AIGatewayRoutecan now referenceAIServiceBackendresources in different namespaces, enabling multi-tenant and organizational separation patterns.
ReferenceGrant validation
Comprehensive ReferenceGrant integration following Gateway API patterns, with automatic validation and clear error messages when grants are missing.
Enhanced Upstream Authentication
AWS SDK default credential chain
Support for AWS SDK's default credential chain including IRSA (IAM Roles for Service Accounts), EKS Pod Identity, EC2 Instance Profiles, and environment variables, eliminating need for static credentials or OIDC settings
Azure API key authentication
Native Azure OpenAI API key authentication using the
api-keyheader, matching Azure SDK conventions and console practices.
Traffic Management and Configuration
Header mutations at route and backend levels
New
headerMutationfields in bothAIServiceBackendandAIGatewayRouteRuleBackendRefenable header manipulation with smart merge logic for advanced routing scenarios.
InferencePool v1 support
Updated to Gateway API Inference Extension v1.0, providing stable intelligent endpoint selection with enhanced performance and reliability.
Cached token usage tracking for actual token usage reporting
Captures and reports cached token statistics from cloud providers (Anthropic, Bedrock, etc.), providing accurate cost attribution for prompt caching features.
Standalone Mode and CLI
Docker image support
Official Docker images for the aigw CLI published to GitHub Container Registry, enabling containerized standalone deployments with proper health checks and lifecycle management.
Multi-provider auto-configuration
Zero-config standalone mode with automatic configuration from
OPENAI_API_KEY,AZURE_OPENAI_API_KEY, orANTHROPIC_API_KEYenvironment variables. Generates complete Envoy configuration with OpenAI SDK compatibility.
MCP server configuration
Native MCP support in standalone mode via
--mcp-configand--mcp-jsonflags, enabling unified LLM and MCP server configuration in a single aigw run invocation without Kubernetes.
XDG Base Directory standards
Proper separation of configuration, data, state, and runtime files following XDG Base Directory specification, improving organization and enabling better cleanup and management of aigw state.
Enhanced readiness monitoring
Improved Envoy readiness detection and status reporting in standalone mode, providing clear insights into when the gateway is ready to accept traffic with better error messages.
Consolidated admin server
Unified admin server on a single port serving both
/metricsand/healthendpoints, simplifying monitoring and health check configuration.
Improved error handling
aigwCLI now fails fast and exits cleanly if external processor fails to start, preventing silent failures and improving debugging experience.
Type-safe Kubernetes client SDK
Generated client libraries for all AI Gateway CRDs following standard Kubernetes client-go patterns, enabling developers to build controllers, operators, and custom integrations with type safety.
Observability Enhancements
MCP operations observability
Comprehensive monitoring, logging, and tracing for MCP operations with configurable access logs and metrics enrichment for MCP server interactions and tool routing.
Image generation tracing and metrics
OpenInference-compliant distributed tracing and OpenTelemetry Gen AI metrics for image generation requests with detailed request parameters and timing information.
OpenTelemetry native metrics export
Support for OTEL-native metrics export (in addition to Prometheus), enabling integration with Elastic Stack, OTEL-TUI, and other OTEL-native observability systems. Includes console exporter for ad-hoc debugging.
Embeddings tracing implementation
Complete OpenInference-compliant tracing for embeddings operations, complementing existing chat completion tracing.
Enhanced /messages endpoint metrics
Distinct metrics for Anthropic's
/messagesendpoint, providing accurate attribution separate from/chat/completionsendpoints.
Original model tracking
Metrics now track both the original requested model and any overridden model names, providing accurate attribution in multi-provider and model virtualization scenarios.
🔗 API Updates
- New MCPRoute CRD
- Introduces MCPRoute custom resource with comprehensive fields for MCP server configuration, tool filtering, authentication policies (OAuth and API key), and Protected Resource Metadata.
- Cross-namespace references in AIGatewayRoute
- Added namespace field to AIGatewayRouteRuleBackendRef, enabling cross-namespace backend references with ReferenceGrant validation.
- Header mutations at route and backend levels
- Added headerMutation fields to both AIServiceBackend and AIGatewayRouteRuleBackendRef for backend-level and per-route header manipulation with smart merge logic.
- New AWSAnthropic API schema
- Added AWSAnthropic schema for Claude models on AWS Bedrock using the native Anthropic Messages API format, providing full feature parity with direct Anthropic API.
- Anthropic API key authentication
- Added AnthropicAPIKey to BackendSecurityPolicy for x-api-key header authentication.
- Azure API key authentication
- Added AzureAPIKey to BackendSecurityPolicy for api-key header authentication.
- **AWS credential chain support...
v0.4.0-rc2
Release candidate for v0.4.0!
helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.4.0-rc2 --namespace envoy-ai-gateway-system --create-namespace
v0.4.0-rc1
Release candidate for v0.4.0!
helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.4.0-rc1 --namespace envoy-ai-gateway-system --create-namespace
v0.3.0
Release Announcement
Check out the v0.3.0 release notes to learn more about the release.
Envoy AI Gateway v0.3.x
Release version introducing intelligent inference routing with Endpoint Picker Provider, enhanced observability features, Google Vertex AI support, and enhanced provider integrations.
v0.3.0
August 21, 2025
Envoy AI Gateway v0.3.0 introduces intelligent inference routing, expanded provider support (including Google Vertex AI and Anthropic), and enhanced observability with OpenInference tracing and configurable metrics. Key features include Endpoint Picker Provider with InferencePool for dynamic load balancing, model name virtualization, and seamless Gateway API Inference Extension integration.
✨ New Features
Endpoint Picker Provider (EPP) Integration
- Gateway API Inference Extension Support
- Complete integration with Gateway API Inference Extension v0.5.1, enabling intelligent endpoint selection based on real-time AI inference metrics like KV-cache usage, queue depth, and LoRA adapter information.
- Dual Integration Approaches
- Support for both
HTTPRoute + InferencePoolandAIGatewayRoute + InferencePoolintegration patterns, providing flexibility for different use cases from simple to advanced AI routing scenarios.
- Support for both
- Dynamic Load Balancing
- Intelligent routing that automatically selects the optimal inference endpoint for each request, optimizing resource utilization across your entire inference infrastructure with real-time performance metrics.
- Extensible Architecture
- Support for custom endpoint picker providers, allowing implementation of domain-specific routing logic tailored to unique AI workload requirements.
Expanded Provider Ecosystem
- Google Vertex AI Production Support
- Google Vertex AI has moved from work-in-progress to full production support, including complete streaming support for Gemini models with OpenAI API compatibility. View all supported providers →
- Anthropic on Vertex AI Integration
- Complete Anthropic Claude integration via GCP Vertex AI, moving from experimental to production-ready status with multi-tool support and configurable API versions for enterprise deployments.
- Enhanced Gemini Capabilities
- Improved request/response translation for Gemini models with support for tools, response format specification, and advanced conversation handling, making Gemini integration more robust and feature-complete.
- Strengthened OpenAI-Compatible Ecosystem
- Enhanced support for the broader OpenAI-compatible provider ecosystem including Groq, Together AI, Mistral, Cohere, DeepSeek, SambaNova, and more, ensuring seamless integration across the AI provider landscape.
Observability Enhancements
- OpenInference Tracing Support
- Added comprehensive OpenInference distributed tracing with OpenTelemetry integration, providing detailed request tracing and performance monitoring for LLM operations. Includes full chat completion request/response data capture, timing information, and compatibility with evaluation systems like Arize Phoenix. View the documentation →
- Configurable Metrics Labels
- Added support for configuring additional metrics labels corresponding to HTTP request headers. This enables custom labeling of metrics based on specific request headers like user identifiers, API versions, or application contexts, providing more granular monitoring and filtering capabilities.
- Embeddings Metrics Support
- Extended GenAI metrics support to include embeddings operations, providing comprehensive token usage tracking and performance monitoring for both chat completion and embeddings API endpoints with consistent OpenTelemetry semantic conventions.
- Enhanced GenAI Metrics
- Improved AI-specific metrics implementation with better error handling, enhanced attribute mapping, and more accurate token latency measurements. Maintains full compatibility with OpenTelemetry Gen AI semantic conventions while providing more reliable performance analysis data. View the documentation →
Infrastructure and Configuration
-
Model Name Virtualization
- Added a new
modelNameOverridefield in thebackendRefofAIGatewayRoute, enabling flexible model name abstraction across different providers. This allows unified model naming for downstream applications while routing to provider-specific model names, supporting both multi-provider scenarios and fallback configurations. View the documentation →
- Added a new
-
Unified Gateway Support
- Enhanced Gateway resource management by allowing both standard
HTTPRouteandAIGatewayRouteto be attached to the sameGatewayobject. This provides a unified routing configuration that supports both AI and non-AI traffic within a single gateway infrastructure, simplifying deployment and management.
- Enhanced Gateway resource management by allowing both standard
🔗 API Updates
- BackendSecurityPolicy TargetRefs: Added targetRefs field to BackendSecurityPolicy spec, enabling direct targeting of AIServiceBackend resources using Gateway API policy attachment patterns.
- Gateway API Inference Extension: Allows InferencePool resource of Gateway API Inference Extension v0.5.1 to be specified as a backend ref in AIGatewayRoute intelligent endpoint selection.
- modelNameOverride in the backend reference of AIGatewayRoute: Added modelNameOverride field in the backend reference of AIGatewayRoute, allowing for flexible model name rewrite for routing purposes.
Deprecations
backendSecurityPolicyRefPattern: The old pattern of AIServiceBackend referencing BackendSecurityPolicy is deprecated in favor of the new targetRefs approach. Existing configurations will continue to work but should be migrated before v0.4.AIGatewayRoute'stargetRefsPattern: The targetRefs pattern is no longer supported for AIGatewayRoute. Existing configurations will continue to work but should be migrated to parentRefs.AIGatewayRoute's schema Field: The schema field is no longer needed for AIGatewayRoute. Existing configurations will continue to work but should be removed before v0.4.controller.envoyGatewayNamespacehelm value is no longer necessary: This value is no longer necessary and is redundant when configured.controller.podEnvhelm value will be removed: Use controller.extraEnvVars instead. The controller.podEnv value will be removed in v0.4.
📖 Upgrade Guidance
For users upgrading from v0.2.x to v0.3.0:
1. Upgrade Envoy Gateway to v1.5.0 - Ensure you are using Envoy Gateway v1.5.0 or later, as this is required for compatibility with the new AI Gateway features.
2. Update Envoy Gateway config - Update your Envoy Gateway configuration to include the new settings as below. The full manifest is available in the manifests/envoy-gateway-config/config.yaml file as per the getting started guide.
--- a/manifests/envoy-gateway-config/config.yaml
+++ b/manifests/envoy-gateway-config/config.yaml
@@ -43,9 +43,19 @@ data:
extensionManager:
hooks:
xdsTranslator:
+ translation:
+ listener:
+ includeAll: true
+ route:
+ includeAll: true
+ cluster:
+ includeAll: true
+ secret:
+ includeAll: true
post:
- - VirtualHost
- Translation
+ - Cluster
+ - Route
3. Upgrade Envoy AI Gateway to v0.3.0
4. Migrate Gateway target references - Update from the deprecated AIGatewayRoute.targetRefs pattern to the new AIGatewayRoute.parentRefs approach after the upgrade to v0.3.0.
5. Migrate backendSecurityPolicy references - Update from the deprecated AIServiceBackend.backendSecurityPolicyRef pattern to the new BackendSecurityPolicy.targetRefs approach after the upgrade to v0.3.0.
6. Remove AIGatewayRoute.schema - remove the schema field from AIGatewayRoute resources after the upgrade to v0.3.0, as it is no longer used.
📦 Dependencies Versions
- Go 1.24.6
- Updated to latest Go version for improved performance and security.
- Envoy Gateway v1.5
- Built on Envoy Gateway for proven data plane capabilities.
- Envoy v1.35
- Leveraging Envoy Proxy's battle-tested networking capabilities.
- Gateway API v1.3.1
- Support for latest Gateway API specifications.
- Gateway API Inference Extension v0.5.1
- Integration with Gateway API Inference Extension for intelligent endpoint selection.
🙏 Acknowledgements
This release represents the collaborative effort of our growing community. Special thanks to contributors from Tetrate, Bloomberg, Tencent, Google, Nutanix and our independent contributors who made this release possible through their code contributions, testing, feedback, and community participation.
The Endpoint Picker Provider integration represents a significant milestone in making AI inference routing more intelligent and efficient. We appreciate all the feedback and testing from the community that helped shape this feature.
New Contributors
- @sukumargaonkar made their first contribution in #635
- @isyangban made their first contribution in #729
- @whzghb made their first contribution in #743
- @yduwcui made their first contribution in https://github.com/envoyproxy/ai-gatewa...
v0.3.0-rc2
Release candidate
v0.3.0-rc1
Release candidate for v0.3.0!
helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.3.0-rc1 --namespace envoy-ai-gateway-system --create-namespace