Skip to content

Releases: envoyproxy/ai-gateway

v0.6.0

05 May 20:43
a82fcf5

Choose a tag to compare

Envoy AI Gateway v0.6.0

Envoy AI Gateway v0.6.0 marks the first production-ready API surface:

  • The core CRDs (AIGatewayRoute, AIServiceBackend, BackendSecurityPolicy, GatewayConfig, MCPRoute) are now served at v1beta1.
  • AWS Bedrock gains a native InvokeModel path for Claude alongside Titan embeddings via the OpenAI /v1/embeddings contract.
  • Gemini gets first-class embeddings and Anthropic-style prefix context caching.
  • Cross-provider clients can hit Anthropic's /v1/messages endpoint on any OpenAI-compatible backend, and a single reasoning_effort knob now works across Anthropic, OpenAI, and Gemini.
  • Operators get GKE Workload Identity via Application Default Credentials, configurable webhook host networking, request/response body redaction for compliance, and the Go 1.26.2 + Envoy 1.37 + Envoy Gateway 1.7 baseline.

Two breaking changes land in v0.6AIGatewayRoute.spec.filterConfig is removed (move to GatewayConfig), and the deprecated version-as-prefix behavior on VersionedAPISchema is removed (use prefix). See Upgrade Guidance below.

📖 Full documentation

⚠️ Breaking Changes

  • AIGatewayRoute.spec.filterConfig removed. The filterConfig field on AIGatewayRoute has been removed. Move external-processor configuration (resources, env vars, image overrides) to a GatewayConfig resource referenced from the Gateway via the aigateway.envoyproxy.io/gateway-config annotation. v0.5 deprecated the resources subfield with a pointer to GatewayConfig; v0.6 removes the entire filterConfig struct, so anything still set there must move now. See the upgrade guidance below.
  • VersionedAPISchema.version no longer acts as an endpoint prefix for OpenAI-schema backends. The legacy behavior deprecated in v0.5 is gone. Use the prefix field instead (e.g. prefix: /v1beta/openai for Gemini's OpenAI-compatible API, prefix: /compatibility/v1 for Cohere). See the upgrade guidance below.

✨ New Features

AWS Bedrock

  • Native InvokeModel API for Claude — Send requests to Claude models on Bedrock through Bedrock's native InvokeModel endpoint, complementing the existing Converse API path. Useful when applications already speak the Anthropic Messages format and want a thin translation layer.
  • OpenAI → Bedrock Titan embeddings translation — Call Amazon Titan embedding models on Bedrock through the standard OpenAI /v1/embeddings contract. Switch embedding providers without changing client code. Cohere and other Bedrock embedding models are not yet covered and will follow in a later release.

Anthropic and Cross-Provider Translation

  • Anthropic /v1/messages endpoint on OpenAI backends — Expose any OpenAI-compatible backend through Anthropic's Messages API. Lets Claude-style clients reach OpenAI, Azure OpenAI, or any other OpenAI-compatible provider behind the gateway without rewriting requests.
  • Structured output for Claude models — Pass JSON schema constraints through to Claude so responses conform to your declared shape. Available on Anthropic and AWS Bedrock Claude backends today; GCP Vertex AI Claude is excluded pending upstream provider support.
  • Cleaner handling when max_tokens is omitted on Anthropic requests — Requests without an explicit max_tokens no longer crash the translator; they're forwarded so the provider returns a normal validation error. Removes a long-standing footgun when forwarding OpenAI-shaped requests through the Anthropic path.
  • Adaptive thinking for claude-opus-4.6 — Translate Claude's new adaptive thinking mode end-to-end. Adaptive lets the model decide thinking depth per request rather than committing to a fixed budget, so callers can opt in without bespoke provider code.
  • Unified reasoning_effort across Anthropic, OpenAI, and Gemini — A single OpenAI-style reasoning_effort value (low/medium/high/xhigh) now maps onto Anthropic's thinking budgets and Gemini 3's thinking controls. One client knob, three providers.

Gemini Provider

  • Gemini embeddings translation — Use Gemini embedding models through the OpenAI /v1/embeddings contract, completing Gemini coverage alongside chat completions and Responses.
  • Gemini context caching with prefix-style API — Activate Gemini's context caching using the same Anthropic-style cache_control prefix surface already supported elsewhere. Cut input token costs on long, repeated system prompts without a Gemini-specific code path.
  • Gemini reasoning surfaced as thinking blocks — Non-streaming Gemini reasoning is now exposed as both string content and structured thinking_blocks, matching the shape clients already use for Anthropic responses. Streaming responses still surface reasoning as string content only.

OpenAI API Compatibility

  • Responses API — context management and richer streaming — Second wave of Responses API work fills in context management and improved streaming so the /v1/responses path is closer to parity with /v1/chat/completions. If you held off on /v1/responses due to missing features, retest now.
  • Compatibility with open-source Responses API implementations — Improved compatibility with non-OpenAI implementations of the Responses API (e.g. open-source inference servers that expose a /v1/responses endpoint), broadening which Responses-aware clients can sit in front of the gateway.
  • Text-to-speech endpoint /v1/audio/speech — Route OpenAI text-to-speech requests through the gateway, so audio workloads benefit from the same auth, rate limiting, and observability as chat traffic.

MCP Gateway

  • Per-backend header forwarding with renameMCPRouteBackendRef.forwardHeaders accepts a list of inbound headers to forward to each backend, optionally renaming them on the way out. Each MCP backend can receive its own set of headers (e.g. trace context, tenant identifiers, per-user auth) without a single route-wide rule.
  • JWT claim forwarding to MCP backends — Project verified JWT claims into outbound headers via MCPRouteOAuth.claimToHeaders, enabling identity-aware tool execution at backend MCP servers without re-authenticating downstream.
  • Exclude / excludeRegex on tool selectorsMCPToolFilter now supports deny patterns (literal exclude and regex excludeRegex) alongside the existing include rules. Useful when a backend exposes more capabilities than a given route should surface.
  • Tool name in access logs and response metadata — Tool invocations now carry the tool name in dynamic metadata (key mcp_tool_name), so per-tool debugging, dashboards, and access-log fields are straightforward to wire up.
  • Per-backend capability tracking — The gateway tracks which MCP server feature flags (tools, prompts, resources, logging, completions) each backend supports and merges them across a route. Capability negotiation now reflects what's actually reachable, so clients don't get told a feature is available when no reachable backend implements it.

Authentication and Identity

  • GKE Workload Identity via Application Default Credentials — GCP backends now authenticate using the standard ADC chain when neither credentialsFile nor workloadIdentityFederationConfig is set in the BackendSecurityPolicy. Workloads running on GKE pick up Workload Identity automatically — no static service account JSON secret needed.

Security and Privacy

  • Request and response body redaction — Strip or mask sensitive fields in request and response bodies before they hit logs, traces, or metrics. Lets you keep observability on while meeting privacy and compliance constraints.

Observability

  • OTLP access logging auto-configured by aigw — Standalone aigw wires up OTLP access logging out of the box when an OTLP endpoint is configured (via OTEL_EXPORTER_OTLP_ENDPOINT), removing a manual step from local-dev and demo paths.
  • Default agent-session-idsession.id header mapping — Spans and logs now correlate by session.id automatically when clients send the agent-session-id header, so agent frameworks like Goose get session correlation with zero config. Override or disable via OTEL_AIGW_REQUEST_HEADER_ATTRIBUTES. Metrics never default to session IDs (high cardinality).
  • ReasoningToken cost typeLLMRequestCostType now includes ReasoningToken, so you can budget and bill against thinking tokens separately from input, output, and cache cost types.
  • Response model metadata — Responses now carry the resolved upstream model in metadata, which clients and downstream tools can read to confirm exactly which model served a request (useful when routes use model aliasing or fallback).
  • OTEL attribute count cap removed for large contexts — Removed the OTEL span attribute count limit so long-context requests no longer have parts of their trace silently dropped.

Operations and Extensibility

  • Custom webhook port and host network — The conversion webhook can now bind to a configurable port (controller.mutatingWebhook.port) and run on the host network (controller.hostNetwork), smoothing installs in clusters with restrictive admission webhook networking such as GKE private clusters.
  • Lua filter slot after the AI ExtProc stage — Lua filters can now be attached after the AI ExtProc stage in the standard filter chain, so you can do last-mile request shaping (header rewrites, body tweaks) without writing a custom EnvoyExtensionPolicy.
  • Route-scoped LLM request costs with global defaults — Set GatewayConfig.spec.globalLLMRequestCosts for fleet-wide defaults and override per-route at AIGatewayRoute.spec.llmRequestCosts. Makes per-tenant or per-backend cost tracking straightforward without per-route boilerplate.

🔗 API Updates

  • **Core CRDs promoted to `aigateway.en...
Read more

v0.6.0-rc1

30 Apr 21:12
d63a020

Choose a tag to compare

v0.6.0-rc1 Pre-release
Pre-release

Release candidate

v0.5.0

23 Jan 21:17
b40501f

Choose a tag to compare

Envoy AI Gateway v0.5.0

Multi-gateway configuration, prompt caching cost savings, fine-grained MCP authorization, OpenAI Responses API, and Google Search grounding for Gemini.

Envoy AI Gateway v0.5.0 makes multi-gateway deployments easier with the new GatewayConfig CRD, cuts costs with prompt caching for AWS Bedrock and GCP Claude, and unlocks fine-grained access control with CEL-based MCP authorization. Developers gain OpenAI Responses API support, Google Search grounding for Gemini, and the ability to mutate request bodies per-route. Under the hood, the switch to sonic JSON processing reduces latency across all requests.

📖 Full documentation


✨ New Features

Gateway Configuration

  • New GatewayConfig CRD — Gateway-scoped configuration via a new custom resource. Reference it from a Gateway via the aigateway.envoyproxy.io/gateway-config annotation to configure the external processor container (env vars, resource requirements, container settings). Multiple Gateways can share the same GatewayConfig.
  • Configurable endpoint prefixes — New prefix field on VersionedAPISchema for backends with non-standard OpenAI-compatible prefixes (e.g., Gemini's /v1beta/openai, Cohere's /compatibility/v1).

OpenAI API Support

  • OpenAI Responses API (/v1/responses) — Full support with streaming and non-streaming modes, function calling, MCP tools, reasoning, multi-turn conversations, multimodal capabilities, token usage tracking, and OpenInference tracing.

Provider Caching Enhancements

  • Prompt caching for AWS Bedrock Claude — Reuse cached system prompts with Bedrock Anthropic models. Cache point markers are handled automatically with separate tracking for cache creation and cache hit tokens.
  • Prompt caching for GCP Vertex AI Claude — Same cost-saving prompt caching for Claude models on GCP Vertex AI for system prompts and few-shot examples.

MCP Gateway Enhancements

  • Fine-grained authorization with CEL, JWT claims, and external auth — Write expressive CEL rules using request attributes (HTTP method, headers, JWT claims, tool names, call arguments), enforce access based on JWT claim values, or delegate to external gRPC/HTTP authorization services.
  • Real-time tool list synchronization — MCP clients automatically receive notifications/tools/list_changed when MCPRoutes update, refreshing available tools without reconnection.
  • Stdio server proxy in standalone mode — Run command-line MCP tools (e.g., npx-based servers) without code changes via the aigw CLI HTTP proxy.
  • Improved OAuth metadata discovery — Well-known endpoints now serve at the MCPRoute path prefix for correct authorization discovery across multiple routes.

Inference Extension

  • Security policies for inference pools — Apply BackendSecurityPolicy to InferencePool resources for consistent authentication across dynamically-selected inference endpoints.

Gemini Provider Enhancements

  • Google Search grounding — Give Gemini models access to real-time web information via the google_search tool type with domain filtering, blocking confidence thresholds, and time range restrictions.
  • Consistent thinking configuration across providers — Same thinking configuration works for both Anthropic and Gemini models for provider-agnostic reasoning features.
  • Gemini 3 reasoning and image quality controlsthinking_level (reasoning depth) and media_resolution (image quality vs. speed) with graceful degradation on older Gemini versions.
  • Visibility into model reasoning — Thought summaries extracted and surfaced from Gemini responses when thinking is enabled.
  • Enterprise web search integrationenterprise_search tool type for grounding responses in organization-specific search infrastructure and data sources.

Traffic Management

  • Route-level body mutation — Inject or remove JSON fields in request bodies per-backend using bodyMutation with set and remove operations. Route-level settings override backend defaults.
  • AWS Bedrock service tier control — Choose between standard, flex, priority, and reserved tiers for latency-sensitive or cost-optimized workloads with automatic fallback handling.

Observability Enhancements

  • Per-provider cost attribution — New gen_ai.provider.name metric attribute for filtering dashboards and alerts by provider.
  • Full tracing for Anthropic Messages API — OpenInference-compliant tracing for the native /messages endpoint, compatible with Arize Phoenix and OpenTelemetry platforms.
  • Cohere Rerank visibility — Full OpenTelemetry support for Cohere's v2 rerank endpoint capturing query, documents, and relevance scores.

Performance and Operations

  • Faster request processing with sonic JSON — Migrated to bytedance/sonic for JSON encoding/decoding with measurable latency improvements and lower CPU usage.
  • Faster cross-namespace reference validation — Optimized ReferenceGrant indexing reduces controller reconciliation time.
  • Improved MCP proxy throughput — HTTP connection reuse across MCP proxy requests eliminates per-request connection overhead. Details →

🔗 API Updates

  • New GatewayConfig CRD — Gateway-level configuration with extProc.kubernetes for container settings. Reference via aigateway.envoyproxy.io/gateway-config annotation.
  • VersionedAPISchema.prefix — New prefix field replaces overloading version for endpoint path customization.
  • AIGatewayRouteRuleBackendRef.bodyMutation — New field with set (field/value pairs) and remove (field names) for request body manipulation.
  • LLMRequestCostType.CacheCreationInputToken — New cost type for tokens written to cache, separate from CachedInputToken.
  • MCPRouteSecurityPolicy authorization fields — New authorization block with defaultAction, rules array (CEL, JWT scopes/claims, tools targeting), and extAuth for external authorization.
  • BackendSecurityPolicy.targetRefs expansion — Now accepts InferencePool (inference.networking.x-k8s.io) in addition to AIServiceBackend.

Deprecations

  • AIGatewayFilterConfigExternalProcessor.resources — Deprecated. Use GatewayConfig instead. Will be removed in v0.6.
  • version field as prefix for OpenAI schema — Deprecated. Use the new prefix field. Legacy behavior will be removed in v0.6.

🐛 Bug Fixes

  • AWS Bedrock Claude streaming reliability — Streaming responses from Bedrock Claude models now complete correctly without truncation.
  • Gemini streaming token counts — Token usage in Gemini streaming responses now matches OpenAI format.
  • Multi-chunk Gemini tool calls — Tool calls spanning multiple streaming chunks now have correct indices.
  • GCP Claude reasoning content — Reasoning/thinking content correctly passes through for Claude on GCP Vertex AI.
  • Zero-weight backend references — Backend references with zero weight no longer cause routing errors.
  • Umbrella chart image pull secrets — Helm deployments within umbrella charts correctly inherit global.imagePullSecrets.
  • GCP global region backends — Vertex AI backends with global region now work correctly.
  • Accurate per-token latency metrics — Fixed integer truncation in time_per_output_token calculation.
  • Anthropic token counting — Improved accuracy of input and output token counts for Anthropic models.

📖 Upgrade Guidance

Migrating to GatewayConfig

If you're using AIGatewayFilterConfigExternalProcessor.resources, migrate to the new GatewayConfig CRD:

  1. Create a GatewayConfig resource:
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: GatewayConfig
metadata:
  name: my-gateway-config
  namespace: default
spec:
  extProc:
    kubernetes:
      resources:
        requests:
          cpu: "100m"
          memory: "128Mi"
        limits:
          cpu: "500m"
          memory: "512Mi"
      env:
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://otel-collector:4317"
  1. Reference from your Gateway:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: ai-gateway
  annotations:
    aigateway.envoyproxy.io/gateway-config: my-gateway-config

Migrating Endpoint Prefix Configuration

Before:

schema:
  name: OpenAI
  version: "/v1beta/openai"  # Deprecated

After:

schema:
  name: OpenAI
  prefix: "/v1beta/openai"

📦 Dependencies

Dependency Version
Go 1.25.6
Envoy Gateway v1.6
Envoy Proxy v1.36.4
Gateway API v1.4.0
Gateway API Inference Extension v1.0.2

🙏 Acknowledgements

Special thanks to the growing community of adopters including Bloomberg, LY Corporation, Alan by Comma Soft, and NRP for their production insights, everyone who reported bugs, submitted PRs, and participated in design discussions, and the Envoy Gateway team for continued collaboration.


🔮 What's Next

  • Additional provider integrations (AWS Bedrock InvokeModel, Gemini embeddings, Azure/AKS workload identity)
  • Batch inference APIs for high-volume workloads
  • Advanced caching strategies with prompt cache key and retention controls
  • Upstream provider quota policies
  • Sensitive data redaction for request and response bodies

v0.5.0-rc1

12 Jan 19:47
953951f

Choose a tag to compare

v0.5.0-rc1 Pre-release
Pre-release

Release candidate for v0.5.0!

helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.5.0-rc1 --namespace envoy-ai-gateway-system --create-namespace

v0.4.0

08 Nov 00:55
ad5f75e

Choose a tag to compare

Envoy AI Gateway v0.4.0 - November 07, 2025

Release introducing Model Context Protocol (MCP) Gateway, OpenAI Image Generation, Anthropic support (direct and AWS Bedrock), guided output decoding for GCP Vertex AI/Gemini, cross-namespace references, enhanced authentication, and comprehensive observability improvements.

🔗 View release notes on the site

✨ New Features

Model Context Protocol (MCP) Gateway

New MCPRoute CRD

Introduces MCPRoute custom resource for routing MCP requests to backend MCP servers, enabling unified AI API for multiple MCP backends.

Complete MCP spec implementation

Includes streamable HTTP transport, JSON-RPC 2.0 support, and MCP spec-compliant OAuth 2.0 authorization with JWKS validation and Protected Resource Metadata.

Server multiplexing and tool routing

Aggregates multiple MCP servers behind a single endpoint with intelligent tool routing, tool filtering (exact match and regex patterns), and collision detection.

Upstream authentication

Supports both OAuth-based authentication and API key authentication for secure backend MCP server communication with configurable headers.

Session management

Implements MCP session handling with encryption, rotatable seeds, and graceful session lifecycle management.

Anthropic Provider Support

Direct api.anthropic.com support

Native integration with Anthropic's API at api.anthropic.com, complementing existing GCP Vertex AI Anthropic support.

AWS Bedrock native Anthropic Messages API

Support for Claude models on AWS Bedrock using the native Anthropic Messages API format instead of the generic Converse API, enabling full feature parity with direct Anthropic API including prompt caching and extended thinking.

Anthropic API key authentication

Native x-api-key header-based authentication matching Anthropic's API conventions and SDK patterns for direct Anthropic connections.

Passthrough translator with token usage tracking

Efficient passthrough translation layer that captures token usage and maintains API compatibility while minimizing overhead for both direct and AWS Bedrock Anthropic endpoints.

Standalone CLI auto-configuration

Auto-configuration from ANTHROPIC_API_KEY environment variable in standalone mode for zero-config deployments.

Guided Output Support for GCP Vertex AI/Gemini

Guided regex support

Constrains model outputs to match specific regular expressions for GCP Vertex AI/Gemini models, enabling structured text generation.

Guided choice support

Restricts model outputs to predefined choices for GCP Vertex AI/Gemini models, ensuring responses conform to expected values.

Guided JSON support

Ensures model outputs are valid JSON conforming to specified schemas for GCP Vertex AI/Gemini models, with OpenAI-compatible API translation.

Provider-Specific Enhancements

OpenAI Image Generation /v1/images/generations endpoint

End-to-end support for OpenAI's image generation API including request/response translation, Brotli encoding/decoding, and full protocol compatibility.

OpenAI legacy /v1/completions endpoint

Full pass-through support for OpenAI's legacy completions endpoint with complete tracing and metrics, ensuring backward compatibility.

Azure OpenAI embeddings support

Native support for Azure OpenAI embeddings API with proper protocol translation and token usage tracking.

AWS Bedrock reasoning tokens

Full support for reasoning/thinking tokens in AWS Bedrock responses for both streaming and non-streaming modes, properly exposing extended thinking processes in Claude models.

GCP Vertex AI safety settings

Support for GCP-specific safety settings configuration, allowing fine-grained control over content filtering and safety thresholds for Gemini models.

GCP Gemini streaming token accounting

Accurate completion_tokens reporting in streaming usage chunks for Gemini models, ensuring proper token accounting during streaming responses.

Cross-Namespace Resource References

Cross-namespace AIServiceBackend references

AIGatewayRoute can now reference AIServiceBackend resources in different namespaces, enabling multi-tenant and organizational separation patterns.

ReferenceGrant validation

Comprehensive ReferenceGrant integration following Gateway API patterns, with automatic validation and clear error messages when grants are missing.

Enhanced Upstream Authentication

AWS SDK default credential chain

Support for AWS SDK's default credential chain including IRSA (IAM Roles for Service Accounts), EKS Pod Identity, EC2 Instance Profiles, and environment variables, eliminating need for static credentials or OIDC settings

Azure API key authentication

Native Azure OpenAI API key authentication using the api-key header, matching Azure SDK conventions and console practices.

Traffic Management and Configuration

Header mutations at route and backend levels

New headerMutation fields in both AIServiceBackend and AIGatewayRouteRuleBackendRef enable header manipulation with smart merge logic for advanced routing scenarios.

InferencePool v1 support

Updated to Gateway API Inference Extension v1.0, providing stable intelligent endpoint selection with enhanced performance and reliability.

Cached token usage tracking for actual token usage reporting

Captures and reports cached token statistics from cloud providers (Anthropic, Bedrock, etc.), providing accurate cost attribution for prompt caching features.

Standalone Mode and CLI

Docker image support

Official Docker images for the aigw CLI published to GitHub Container Registry, enabling containerized standalone deployments with proper health checks and lifecycle management.

Multi-provider auto-configuration

Zero-config standalone mode with automatic configuration from OPENAI_API_KEY, AZURE_OPENAI_API_KEY, or ANTHROPIC_API_KEY environment variables. Generates complete Envoy configuration with OpenAI SDK compatibility.

MCP server configuration

Native MCP support in standalone mode via --mcp-config and --mcp-json flags, enabling unified LLM and MCP server configuration in a single aigw run invocation without Kubernetes.

XDG Base Directory standards

Proper separation of configuration, data, state, and runtime files following XDG Base Directory specification, improving organization and enabling better cleanup and management of aigw state.

Enhanced readiness monitoring

Improved Envoy readiness detection and status reporting in standalone mode, providing clear insights into when the gateway is ready to accept traffic with better error messages.

Consolidated admin server

Unified admin server on a single port serving both /metrics and /health endpoints, simplifying monitoring and health check configuration.

Improved error handling

aigw CLI now fails fast and exits cleanly if external processor fails to start, preventing silent failures and improving debugging experience.

Type-safe Kubernetes client SDK

Generated client libraries for all AI Gateway CRDs following standard Kubernetes client-go patterns, enabling developers to build controllers, operators, and custom integrations with type safety.

Observability Enhancements

MCP operations observability

Comprehensive monitoring, logging, and tracing for MCP operations with configurable access logs and metrics enrichment for MCP server interactions and tool routing.

Image generation tracing and metrics

OpenInference-compliant distributed tracing and OpenTelemetry Gen AI metrics for image generation requests with detailed request parameters and timing information.

OpenTelemetry native metrics export

Support for OTEL-native metrics export (in addition to Prometheus), enabling integration with Elastic Stack, OTEL-TUI, and other OTEL-native observability systems. Includes console exporter for ad-hoc debugging.

Embeddings tracing implementation

Complete OpenInference-compliant tracing for embeddings operations, complementing existing chat completion tracing.

Enhanced /messages endpoint metrics

Distinct metrics for Anthropic's /messages endpoint, providing accurate attribution separate from /chat/completions endpoints.

Original model tracking

Metrics now track both the original requested model and any overridden model names, providing accurate attribution in multi-provider and model virtualization scenarios.

🔗 API Updates

  • New MCPRoute CRD
    • Introduces MCPRoute custom resource with comprehensive fields for MCP server configuration, tool filtering, authentication policies (OAuth and API key), and Protected Resource Metadata.
  • Cross-namespace references in AIGatewayRoute
    • Added namespace field to AIGatewayRouteRuleBackendRef, enabling cross-namespace backend references with ReferenceGrant validation.
  • Header mutations at route and backend levels
    • Added headerMutation fields to both AIServiceBackend and AIGatewayRouteRuleBackendRef for backend-level and per-route header manipulation with smart merge logic.
  • New AWSAnthropic API schema
    • Added AWSAnthropic schema for Claude models on AWS Bedrock using the native Anthropic Messages API format, providing full feature parity with direct Anthropic API.
  • Anthropic API key authentication
    • Added AnthropicAPIKey to BackendSecurityPolicy for x-api-key header authentication.
  • Azure API key authentication
    • Added AzureAPIKey to BackendSecurityPolicy for api-key header authentication.
  • **AWS credential chain support...
Read more

v0.4.0-rc2

07 Nov 22:42
ad5f75e

Choose a tag to compare

v0.4.0-rc2 Pre-release
Pre-release

Release candidate for v0.4.0!

helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.4.0-rc2 --namespace envoy-ai-gateway-system --create-namespace

v0.4.0-rc1

05 Nov 21:31
a0e4c0e

Choose a tag to compare

v0.4.0-rc1 Pre-release
Pre-release

Release candidate for v0.4.0!

helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.4.0-rc1 --namespace envoy-ai-gateway-system --create-namespace

v0.3.0

21 Aug 10:28
da86fc6

Choose a tag to compare

Release Announcement

Check out the v0.3.0 release notes to learn more about the release.

Envoy AI Gateway v0.3.x

Release version introducing intelligent inference routing with Endpoint Picker Provider, enhanced observability features, Google Vertex AI support, and enhanced provider integrations.

v0.3.0

August 21, 2025
Envoy AI Gateway v0.3.0 introduces intelligent inference routing, expanded provider support (including Google Vertex AI and Anthropic), and enhanced observability with OpenInference tracing and configurable metrics. Key features include Endpoint Picker Provider with InferencePool for dynamic load balancing, model name virtualization, and seamless Gateway API Inference Extension integration.

✨ New Features

Endpoint Picker Provider (EPP) Integration

  • Gateway API Inference Extension Support
    • Complete integration with Gateway API Inference Extension v0.5.1, enabling intelligent endpoint selection based on real-time AI inference metrics like KV-cache usage, queue depth, and LoRA adapter information.
  • Dual Integration Approaches
    • Support for both HTTPRoute + InferencePool and AIGatewayRoute + InferencePool integration patterns, providing flexibility for different use cases from simple to advanced AI routing scenarios.
  • Dynamic Load Balancing
    • Intelligent routing that automatically selects the optimal inference endpoint for each request, optimizing resource utilization across your entire inference infrastructure with real-time performance metrics.
  • Extensible Architecture
    • Support for custom endpoint picker providers, allowing implementation of domain-specific routing logic tailored to unique AI workload requirements.

Expanded Provider Ecosystem

  • Google Vertex AI Production Support
    • Google Vertex AI has moved from work-in-progress to full production support, including complete streaming support for Gemini models with OpenAI API compatibility. View all supported providers →
  • Anthropic on Vertex AI Integration
    • Complete Anthropic Claude integration via GCP Vertex AI, moving from experimental to production-ready status with multi-tool support and configurable API versions for enterprise deployments.
  • Enhanced Gemini Capabilities
    • Improved request/response translation for Gemini models with support for tools, response format specification, and advanced conversation handling, making Gemini integration more robust and feature-complete.
  • Strengthened OpenAI-Compatible Ecosystem
    • Enhanced support for the broader OpenAI-compatible provider ecosystem including Groq, Together AI, Mistral, Cohere, DeepSeek, SambaNova, and more, ensuring seamless integration across the AI provider landscape.

Observability Enhancements

  • OpenInference Tracing Support
    • Added comprehensive OpenInference distributed tracing with OpenTelemetry integration, providing detailed request tracing and performance monitoring for LLM operations. Includes full chat completion request/response data capture, timing information, and compatibility with evaluation systems like Arize Phoenix. View the documentation →
  • Configurable Metrics Labels
    • Added support for configuring additional metrics labels corresponding to HTTP request headers. This enables custom labeling of metrics based on specific request headers like user identifiers, API versions, or application contexts, providing more granular monitoring and filtering capabilities.
  • Embeddings Metrics Support
    • Extended GenAI metrics support to include embeddings operations, providing comprehensive token usage tracking and performance monitoring for both chat completion and embeddings API endpoints with consistent OpenTelemetry semantic conventions.
  • Enhanced GenAI Metrics
    • Improved AI-specific metrics implementation with better error handling, enhanced attribute mapping, and more accurate token latency measurements. Maintains full compatibility with OpenTelemetry Gen AI semantic conventions while providing more reliable performance analysis data. View the documentation →

Infrastructure and Configuration

  • Model Name Virtualization

    • Added a new modelNameOverride field in the backendRef of AIGatewayRoute, enabling flexible model name abstraction across different providers. This allows unified model naming for downstream applications while routing to provider-specific model names, supporting both multi-provider scenarios and fallback configurations. View the documentation →
  • Unified Gateway Support

    • Enhanced Gateway resource management by allowing both standard HTTPRoute and AIGatewayRoute to be attached to the same Gateway object. This provides a unified routing configuration that supports both AI and non-AI traffic within a single gateway infrastructure, simplifying deployment and management.

🔗 API Updates

  • BackendSecurityPolicy TargetRefs: Added targetRefs field to BackendSecurityPolicy spec, enabling direct targeting of AIServiceBackend resources using Gateway API policy attachment patterns.
  • Gateway API Inference Extension: Allows InferencePool resource of Gateway API Inference Extension v0.5.1 to be specified as a backend ref in AIGatewayRoute intelligent endpoint selection.
  • modelNameOverride in the backend reference of AIGatewayRoute: Added modelNameOverride field in the backend reference of AIGatewayRoute, allowing for flexible model name rewrite for routing purposes.

Deprecations

  • backendSecurityPolicyRef Pattern: The old pattern of AIServiceBackend referencing BackendSecurityPolicy is deprecated in favor of the new targetRefs approach. Existing configurations will continue to work but should be migrated before v0.4.
  • AIGatewayRoute's targetRefs Pattern: The targetRefs pattern is no longer supported for AIGatewayRoute. Existing configurations will continue to work but should be migrated to parentRefs.
  • AIGatewayRoute's schema Field: The schema field is no longer needed for AIGatewayRoute. Existing configurations will continue to work but should be removed before v0.4.
  • controller.envoyGatewayNamespace helm value is no longer necessary: This value is no longer necessary and is redundant when configured.
  • controller.podEnv helm value will be removed: Use controller.extraEnvVars instead. The controller.podEnv value will be removed in v0.4.

📖 Upgrade Guidance

For users upgrading from v0.2.x to v0.3.0:

1. Upgrade Envoy Gateway to v1.5.0 - Ensure you are using Envoy Gateway v1.5.0 or later, as this is required for compatibility with the new AI Gateway features.

2. Update Envoy Gateway config - Update your Envoy Gateway configuration to include the new settings as below. The full manifest is available in the manifests/envoy-gateway-config/config.yaml file as per the getting started guide.

--- a/manifests/envoy-gateway-config/config.yaml
+++ b/manifests/envoy-gateway-config/config.yaml
@@ -43,9 +43,19 @@ data:
extensionManager:
  hooks:
    xdsTranslator:
+     translation:
+       listener:
+         includeAll: true
+       route:
+         includeAll: true
+       cluster:
+         includeAll: true
+       secret:
+         includeAll: true
  post:
- - VirtualHost
  - Translation
+ - Cluster
+ - Route

3. Upgrade Envoy AI Gateway to v0.3.0

4. Migrate Gateway target references - Update from the deprecated AIGatewayRoute.targetRefs pattern to the new AIGatewayRoute.parentRefs approach after the upgrade to v0.3.0.

5. Migrate backendSecurityPolicy references - Update from the deprecated AIServiceBackend.backendSecurityPolicyRef pattern to the new BackendSecurityPolicy.targetRefs approach after the upgrade to v0.3.0.

6. Remove AIGatewayRoute.schema - remove the schema field from AIGatewayRoute resources after the upgrade to v0.3.0, as it is no longer used.

📦 Dependencies Versions

  • Go 1.24.6
    • Updated to latest Go version for improved performance and security.
  • Envoy Gateway v1.5
    • Built on Envoy Gateway for proven data plane capabilities.
  • Envoy v1.35
    • Leveraging Envoy Proxy's battle-tested networking capabilities.
  • Gateway API v1.3.1
    • Support for latest Gateway API specifications.
  • Gateway API Inference Extension v0.5.1
    • Integration with Gateway API Inference Extension for intelligent endpoint selection.

🙏 Acknowledgements

This release represents the collaborative effort of our growing community. Special thanks to contributors from Tetrate, Bloomberg, Tencent, Google, Nutanix and our independent contributors who made this release possible through their code contributions, testing, feedback, and community participation.

The Endpoint Picker Provider integration represents a significant milestone in making AI inference routing more intelligent and efficient. We appreciate all the feedback and testing from the community that helped shape this feature.

New Contributors

Read more

v0.3.0-rc2

21 Aug 10:01
da86fc6

Choose a tag to compare

v0.3.0-rc2 Pre-release
Pre-release

Release candidate

v0.3.0-rc1

15 Aug 04:15
e33a5f3

Choose a tag to compare

v0.3.0-rc1 Pre-release
Pre-release

Release candidate for v0.3.0!

helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.3.0-rc1 --namespace envoy-ai-gateway-system --create-namespace