Releases: cloudflare/ai
@cloudflare/tanstack-ai@0.1.3
Patch Changes
-
#435
7381171Thanks @mdhruvil! - Fix workers-ai adapter silently dropping image content parts. -
#424
b2eeca8Thanks @vaibhavshn! - Avoid duplicate tool call IDs by generating unique IDs per tool call index instead of trusting backend-provided IDs -
#411
af08464Thanks @baldyeagle! - Annotate createAnthropicChat to improve client type narrowing -
#398
40e53c8Thanks @vaibhavshn! - fix: addrun/prefix to workers-ai gateway endpoint and make API key optional for gateway bindings -
#444
414b4d5Thanks @mchenco! - AddsessionAffinityoption toWorkersAiAdapterConfigfor prefix-cache optimization. Routes requests with the same key to the same backend replica via thex-session-affinityheader. Supported across binding, REST, and gateway modes.
workers-ai-provider@3.1.2
Patch Changes
- #400
8822603Thanks @threepointone! - Add early config validation tocreateWorkersAIthat throws a clear error when neither a binding nor credentials (accountId + apiKey) are provided. Widen all model type parameters (TextGenerationModels, ImageGenerationModels, EmbeddingModels, TranscriptionModels, SpeechModels, RerankingModels) to accept arbitrary strings while preserving autocomplete for known models.
@cloudflare/tanstack-ai@0.1.2
Patch Changes
-
#406
9af703bThanks @vaibhavshn! - Pass API Key properly for Gemini Tanstack AI Adapter -
#400
8822603Thanks @threepointone! - Add config validation to all Workers AI adapter constructors that throws a clear error when neither a binding, credentials (accountId + apiKey), nor a gateway configuration is provided. Widen all model type parameters (WorkersAiTextModel, WorkersAiImageModel, WorkersAiEmbeddingModel, WorkersAiTranscriptionModel, WorkersAiTTSModel, WorkersAiSummarizeModel) to accept arbitrary strings while preserving autocomplete for known models.
workers-ai-provider@3.1.1
Patch Changes
- #396
2fb3ca8Thanks @threepointone! - - Rewrite README with updated model recommendations (GPT-OSS 120B, EmbeddingGemma 300M, Aura-2 EN)- Stream tool calls incrementally using tool-input-start/delta/end events instead of buffering until stream end
- Fix REST streaming for models that don't support it on /ai/run/ (GPT-OSS, Kimi) by retrying without streaming
- Add Aura-2 EN/ES to SpeechModels type
- Log malformed SSE events with console.warn instead of silently swallowing
@cloudflare/tanstack-ai@0.1.1
Patch Changes
- #396
2fb3ca8Thanks @threepointone! - - Update model recommendations: Aura-2 EN for TTS, Llama 4 Scout for chat examples- Add Aura-2 EN/ES to TTS model type
- Preserve image/vision content in user messages instead of stripping to text-only
- Add non-streaming fallback when REST streaming fails (GPT-OSS, Kimi)
- Warn on premature stream termination instead of silently reporting "stop"
- Consistent console.warn prefix for SSE parse errors
- Move @cloudflare/workers-types from optionalDependencies to devDependencies (types-only, no runtime use)
- Fix @openrouter/sdk version mismatch type errors
workers-ai-provider@3.1.0
Minor Changes
-
#389
8538cd5Thanks @vaibhavshn! - Add transcription, text-to-speech, and reranking support to the Workers AI provider.New capabilities
-
Transcription (
provider.transcription(model)) — implementsTranscriptionModelV3. Supports Whisper models (@cf/openai/whisper,whisper-tiny-en,whisper-large-v3-turbo) and Deepgram Nova-3 (@cf/deepgram/nova-3). Handles model-specific input formats: number arrays for basic Whisper, base64 for v3-turbo via REST, and{ body, contentType }for Nova-3 via binding or raw binary upload for Nova-3 via REST. -
Speech / TTS (
provider.speech(model)) — implementsSpeechModelV3. Supports Workers AI TTS models including Deepgram Aura-1 (@cf/deepgram/aura-1). Acceptstext,voice, andspeedoptions. Returns audio asUint8Array. UsesreturnRawResponseto handle binary audio from the REST path without JSON parsing. -
Reranking (
provider.reranking(model)) — implementsRerankingModelV3. Supports BGE reranker models (@cf/baai/bge-reranker-base,bge-reranker-v2-m3). Converts AI SDK's document format to Workers AI's{ query, contexts, top_k }input. Handles both text and JSON object documents.
Bug fixes
- AbortSignal passthrough —
createRunREST shim now passes the abort signal tofetch, enabling request cancellation and timeout handling. Previously the signal was silently dropped. - Nova-3 REST support — Added
createRunBinaryutility for models that require raw binary upload instead of JSON (used by Nova-3 transcription via REST).
Usage
import { createWorkersAI } from "workers-ai-provider"; import { experimental_transcribe, experimental_generateSpeech, rerank, } from "ai"; const workersai = createWorkersAI({ binding: env.AI }); // Transcription const transcript = await experimental_transcribe({ model: workersai.transcription("@cf/openai/whisper-large-v3-turbo"), audio: audioData, mediaType: "audio/wav", }); // Speech const speech = await experimental_generateSpeech({ model: workersai.speech("@cf/deepgram/aura-1"), text: "Hello world", voice: "asteria", }); // Reranking const ranked = await rerank({ model: workersai.reranking("@cf/baai/bge-reranker-base"), query: "What is machine learning?", documents: ["ML is a branch of AI.", "The weather is sunny."], });
-
workers-ai-provider@3.0.5
Patch Changes
-
#393
91b32e0Thanks @threepointone! - Comprehensive cleanup of the workers-ai-provider package.Bug fixes:
-
Fixed phantom dependency on
fetch-event-streamthat caused runtime crashes when installed outside the monorepo. Replaced with a built-in SSE parser. -
Fixed streaming buffering: responses now stream token-by-token instead of arriving all at once. The root cause was twofold — an eager
ReadableStreamstart()pattern that buffered all chunks, and a heuristic that silently fell back to non-streamingdoGeneratewhenever tools were defined. Both are fixed. Streaming now uses a properTransformStreampipeline with backpressure. -
Fixed
reasoning-deltaID mismatch in simulated streaming — was usinggenerateId()instead of thereasoningIdfrom the precedingreasoning-startevent, causing the AI SDK to drop reasoning content. -
Fixed REST API client (
createRun) silently swallowing HTTP errors. Non-200 responses now throw with status code and response body. -
Fixed
response_formatbeing sent asundefinedon every non-JSON request. Now only included when actually set. -
Fixed
json_schemafield evaluating tofalse(a boolean) instead ofundefinedwhen schema was missing.Workers AI quirk workarounds:
-
Added
sanitizeToolCallId()— strips non-alphanumeric characters and pads/truncates to 9 chars, fixing tool call round-trips through the binding which rejects its own generated IDs. -
Added
normalizeMessagesForBinding()— convertscontent: nullto""and sanitizes tool call IDs before every binding call. Only applied on the binding path (REST preserves original IDs). -
Added null-finalization chunk filtering for streaming tool calls.
-
Added numeric value coercion in native-format streams (Workers AI sometimes returns numbers instead of strings for the
responsefield). -
Improved image model to handle all output types from
binding.run():ReadableStream,Uint8Array,ArrayBuffer,Response, and{ image: base64 }objects. -
Graceful degradation: if
binding.run()returns a non-streaming response despitestream: true, it wraps the complete response as a simulated stream instead of throwing.Premature stream termination detection:
-
Streams that end without a
[DONE]sentinel now reportfinishReason: "error"withraw: "stream-truncated"instead of silently reporting"stop". -
Stream read errors are caught and emit
finishReason: "error"withraw: "stream-error".AI Search (formerly AutoRAG):
-
Added
createAISearchandAISearchChatLanguageModelas the canonical exports, reflecting the rename from AutoRAG to AI Search. -
createAutoRAGstill works but emits a one-time deprecation warning pointing tocreateAISearch. -
createAutoRAGpreserves"autorag.chat"as the provider name for backward compatibility. -
AI Search now warns when tools or JSON response format are provided (unsupported by the
aiSearchAPI). -
Simplified AI Search internals — removed dead tool/response-format processing code.
Code quality:
-
Removed dead code:
workersai-error.ts(never imported),workersai-image-config.ts(inlined). -
Consistent file naming: renamed
workers-ai-embedding-model.tstoworkersai-embedding-model.ts. -
Replaced
StringLikecatch-all index signatures with[key: string]: unknownon settings types. -
Replaced
anytypes with proper interfaces (FlatToolCall,OpenAIToolCall,PartialToolCall). -
Tightened
processToolCallformat detection to checkfunction.nameinstead of just the presence of afunctionproperty. -
Removed
@ai-sdk/provider-utilsandzodpeer dependencies (no longer used in source). -
Added
imageModelto theWorkersAIinterface type for consistency.Tests:
-
149 unit tests across 10 test files (up from 82).
-
New test coverage:
sanitizeToolCallId,normalizeMessagesForBinding,prepareToolsAndToolChoice,processText,mapWorkersAIUsage, image model output types, streaming error scenarios (malformed SSE, premature termination, empty stream), backpressure verification, graceful degradation (non-streaming fallback with text/tools/reasoning), REST API error handling (401/404/500), AI Search warnings, embeddingTooManyEmbeddingValuesForCallError, message conversion with images and reasoning. -
Integration tests for REST API and binding across 12 models and 7 categories (chat, streaming, multi-turn, tool calling, tool round-trip, structured output, image generation, embeddings).
-
All tests use the AI SDK's public APIs (
generateText,streamText,generateImage,embedMany) instead of internal.doGenerate()/.doStream()methods.README:
-
Rewritten from scratch with concise examples, model recommendations, configuration guide, and known limitations section.
-
Updated to use current AI SDK v6 APIs (
generateText+Output.objectinstead of deprecatedgenerateObject,generateImageinstead ofexperimental_generateImage,stopWhen: stepCountIs(2)instead ofmaxSteps). -
Added sections for tool calling, structured output, embeddings, image generation, and AI Search.
-
Uses
wrangler.jsoncformat for configuration examples.
-
@cloudflare/tanstack-ai@0.1.0
Minor Changes
-
#389
a4b756eThanks @vaibhavshn! - Add@cloudflare/tanstack-ai— adapters for using TanStack AI with Cloudflare Workers AI and AI Gateway.Workers AI adapters
All Workers AI adapters support four configuration modes: plain binding (
env.AI), plain REST (account ID + API key), AI Gateway binding (env.AI.gateway(id)), and AI Gateway REST (account ID + gateway ID).- Chat (
createWorkersAiChat) — Streaming chat completions via the OpenAI-compatible API. Includes tool calling with full round-trip support, structured output viajson_schema, and reasoning text streaming (STEP_STARTED/STEP_FINISHEDAG-UI events) for models like QwQ, DeepSeek R1, and Kimi K2.5. A custom fetch shim translates OpenAI SDK calls toenv.AI.run()for binding mode, with a stream transformer that handles both Workers AI native format and OpenAI-compatible format. - Image generation (
createWorkersAiImage) — Stable Diffusion and other text-to-image models. - Transcription (
createWorkersAiTranscription) — Speech-to-text via Whisper and Deepgram Nova-3. - Text-to-speech (
createWorkersAiTts) — Audio generation via Deepgram Aura-1. - Summarization (
createWorkersAiSummarize) — Text summarization via BART-large-CNN. - Embeddings (
createWorkersAiEmbedding) — Text embeddings (implemented but not yet exported, pending TanStack AI'sBaseEmbeddingAdapter).
AI Gateway adapters (third-party providers)
Route requests through Cloudflare AI Gateway for caching, rate limiting, and unified billing. Each adapter injects a custom
fetch(orhttpOptionsfor Gemini) that handles both binding and credential-based gateway configurations.- OpenAI — Chat, summarize, image, transcription, TTS, video (
createOpenAi*) - Anthropic — Chat, summarize (
createAnthropic*) - Gemini — Chat, summarize, image, TTS (
createGemini*). Credentials-only (Google GenAI SDK lacks custom fetch support). - Grok — Chat, summarize, image (
createGrok*) - OpenRouter — Chat, summarize, image (
createOpenRouter*). Accepts any model string.
Utilities
createGatewayFetch— Shared fetch factory that routes requests through AI Gateway (binding or REST), with support for cache control headers (skipCache,cacheTtl,customCacheKey,metadata).createWorkersAiBindingFetch— Fetch shim that makesenv.AIlook like an OpenAI endpoint, including stream transformation and tool call ID sanitization for the binding's strict[a-zA-Z0-9]{9}validation.- Config detection helpers (
isDirectBindingConfig,isDirectCredentialsConfig,isGatewayConfig) using structural typing to discriminateenv.AIfromenv.AI.gateway(id). - Shared binary utilities for normalizing Workers AI responses (Uint8Array, ArrayBuffer, ReadableStream, JSON wrapper) to base64.
Robustness
- Premature stream termination detection — if Workers AI truncates a response or the connection drops (no
finish_reason), the adapter emits proper closing events so consumers don't hang. - Graceful non-streaming fallback — if a model returns a complete response despite
stream: true, the binding shim wraps it into a valid response. - Deepgram Nova-3 transcription uses raw binary audio via REST (not JSON), automatically detected by model name.
Testing
- Comprehensive unit tests (186 tests) covering all adapters, config modes, stream transformation, message building, tool calling, reasoning events, premature termination, and public API surface.
- E2E integration tests against real Workers AI APIs (both binding and REST paths) across 12 chat models + 4 transcription models + image/TTS/summarize, validating chat, multi-turn, tool calling, tool round-trips, structured output, reasoning, and all non-chat capabilities.
- Tree-shakeable package exports with per-adapter entry points for ESM and CJS.
- Chat (
workers-ai-provider@3.0.4
Patch Changes
-
#390
41b92a3Thanks @mchenco! - fix(workers-ai-provider): extract actual finish reason in streaming instead of hardcoded "stop"Previously, the streaming implementation always returned
finishReason: "stop"regardless of the actual completion reason. This caused:- Tool calling scenarios to incorrectly report "stop" instead of "tool-calls"
- Multi-turn tool conversations to fail because the AI SDK couldn't detect when tools were requested
- Length limit scenarios to show "stop" instead of "length"
- Error scenarios to show "stop" instead of "error"
The fix extracts the actual
finish_reasonfrom streaming chunks and uses the existingmapWorkersAIFinishReason()function to properly map it to the AI SDK's finish reason format. This enables proper multi-turn tool calling and accurate completion status reporting.
ai-gateway-provider@3.1.1
Patch Changes
8b1d870Thanks @threepointone! - Update dependencies