Skip to content

Latest commit

 

History

History

README.md

workers-ai-provider

Workers AI provider for the AI SDK. Run Cloudflare's models for chat, embeddings, image generation, transcription, text-to-speech, reranking, and AI Search — all from a single provider.

Quick Start

// wrangler.jsonc
{
	"ai": { "binding": "AI" },
}
import { createWorkersAI } from "workers-ai-provider";
import { streamText } from "ai";

export default {
	async fetch(req: Request, env: { AI: Ai }) {
		const workersai = createWorkersAI({ binding: env.AI });

		const result = streamText({
			model: workersai("@cf/moonshotai/kimi-k2.5"),
			messages: [{ role: "user", content: "Write a haiku about Cloudflare" }],
		});

		return result.toTextStreamResponse();
	},
};
npm install workers-ai-provider ai

Configuration

Workers binding (recommended)

Inside a Cloudflare Worker, pass the env.AI binding directly. No API keys needed.

const workersai = createWorkersAI({ binding: env.AI });

REST API

Outside of Workers (Node.js, Bun, etc.), use your Cloudflare credentials:

const workersai = createWorkersAI({
	accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
	apiKey: process.env.CLOUDFLARE_API_TOKEN,
});

AI Gateway

Route requests through AI Gateway for caching, rate limiting, and observability:

const workersai = createWorkersAI({
	binding: env.AI,
	gateway: { id: "my-gateway" },
});

Models

Browse the full catalog at developers.cloudflare.com/workers-ai/models.

Some good defaults:

Task Model Notes
Chat @cf/moonshotai/kimi-k2.5 256k ctx, tools, vision, reasoning
Chat @cf/zai-org/glm-4.7-flash Fast, multilingual, 131k ctx
Chat @cf/openai/gpt-oss-120b OpenAI open-weights, high reasoning
Reasoning @cf/moonshotai/kimi-k2.5 Configurable reasoning_effort
Reasoning @cf/qwen/qwq-32b Emits reasoning_content
Embeddings @cf/baai/bge-base-en-v1.5 768-dim, English
Embeddings @cf/google/embeddinggemma-300m 100+ languages, by Google
Images @cf/black-forest-labs/flux-1-schnell Fast, free-tier image generation
Transcription @cf/openai/whisper-large-v3-turbo Best accuracy, multilingual
Transcription @cf/deepgram/nova-3 Fast, high accuracy
Text-to-Speech @cf/deepgram/aura-2-en Context-aware, natural pacing
Reranking @cf/baai/bge-reranker-base Fast document reranking

Text Generation

import { generateText } from "ai";

const { text } = await generateText({
	model: workersai("@cf/moonshotai/kimi-k2.5"),
	prompt: "Explain Workers AI in one paragraph",
});

Streaming:

import { streamText } from "ai";

const result = streamText({
	model: workersai("@cf/moonshotai/kimi-k2.5"),
	messages: [{ role: "user", content: "Write a short story" }],
});

for await (const chunk of result.textStream) {
	process.stdout.write(chunk);
}

Reasoning Controls

Reasoning-capable Workers AI models (GLM-4.7-flash, Kimi K2.5/K2.6, GPT-OSS, QwQ) accept reasoning_effort and chat_template_kwargs on their inputs. Either set them at model creation time as settings, or per-call via providerOptions["workers-ai"] (per-call wins):

// Settings-level (applies to every request on this model instance)
const model = workersai("@cf/zai-org/glm-4.7-flash", {
	reasoning_effort: "low", // "low" | "medium" | "high" | null
	chat_template_kwargs: { enable_thinking: false },
});

await generateText({ model, prompt: "Summarize in one sentence." });
// Per-call (overrides any settings-level value)
const model = workersai("@cf/zai-org/glm-4.7-flash");

await generateText({
	model,
	prompt: "Summarize in one sentence.",
	providerOptions: {
		"workers-ai": { reasoning_effort: "low" },
	},
});

reasoning_effort: null is meaningful — it's the explicit "disable reasoning" signal for models that support it. Both fields land on the inputs object of binding.run() (and the JSON body of the REST request), matching the shape expected by Workers AI. See the model catalog for per-model reasoning capabilities.

Vision (Image Inputs)

Send images to vision-capable models like Kimi K2.5:

import { generateText } from "ai";

const { text } = await generateText({
	model: workersai("@cf/moonshotai/kimi-k2.5"),
	messages: [
		{
			role: "user",
			content: [
				{ type: "text", text: "What's in this image?" },
				{ type: "image", image: imageUint8Array },
			],
		},
	],
});

Images can be provided as Uint8Array, base64 strings, or data URLs. Multiple images per message are supported. Works with both the binding and REST API configurations.

Tool Calling

import { generateText, stepCountIs } from "ai";
import { z } from "zod";

const { text } = await generateText({
	model: workersai("@cf/moonshotai/kimi-k2.5"),
	prompt: "What's the weather in London?",
	tools: {
		getWeather: {
			description: "Get the current weather for a city",
			inputSchema: z.object({ city: z.string() }),
			execute: async ({ city }) => ({ city, temperature: 18, condition: "Cloudy" }),
		},
	},
	stopWhen: stepCountIs(2),
});

Structured Output

import { generateText, Output } from "ai";
import { z } from "zod";

const { output } = await generateText({
	model: workersai("@cf/moonshotai/kimi-k2.5"),
	prompt: "Recipe for spaghetti bolognese",
	output: Output.object({
		schema: z.object({
			name: z.string(),
			ingredients: z.array(z.object({ name: z.string(), amount: z.string() })),
			steps: z.array(z.string()),
		}),
	}),
});

Embeddings

import { embedMany } from "ai";

const { embeddings } = await embedMany({
	model: workersai.textEmbedding("@cf/baai/bge-base-en-v1.5"),
	values: ["sunny day at the beach", "rainy afternoon in the city"],
});

Image Generation

import { generateImage } from "ai";

const { images } = await generateImage({
	model: workersai.image("@cf/black-forest-labs/flux-1-schnell"),
	prompt: "A mountain landscape at sunset",
	size: "1024x1024",
});

// images[0].uint8Array contains the PNG bytes

Transcription (Speech-to-Text)

Transcribe audio using Whisper or Deepgram Nova-3 models.

import { transcribe } from "ai";
import { readFile } from "node:fs/promises";

const { text, segments } = await transcribe({
	model: workersai.transcription("@cf/openai/whisper-large-v3-turbo"),
	audio: await readFile("./audio.mp3"),
	mediaType: "audio/mpeg",
});

With language hints (Whisper only):

const { text } = await transcribe({
	model: workersai.transcription("@cf/openai/whisper-large-v3-turbo", {
		language: "fr",
	}),
	audio: audioBuffer,
	mediaType: "audio/wav",
});

Deepgram Nova-3 is also supported and detects language automatically:

const { text } = await transcribe({
	model: workersai.transcription("@cf/deepgram/nova-3"),
	audio: audioBuffer,
	mediaType: "audio/wav",
});

Text-to-Speech

Generate spoken audio from text using Deepgram Aura-2.

import { speech } from "ai";

const { audio } = await speech({
	model: workersai.speech("@cf/deepgram/aura-2-en"),
	text: "Hello from Cloudflare Workers AI!",
	voice: "asteria",
});

// audio is a Uint8Array of MP3 bytes

Reranking

Reorder documents by relevance to a query — useful for RAG pipelines.

import { rerank } from "ai";

const { results } = await rerank({
	model: workersai.reranking("@cf/baai/bge-reranker-base"),
	query: "What is Cloudflare Workers?",
	documents: [
		"Cloudflare Workers lets you run JavaScript at the edge.",
		"A cookie is a small piece of data stored in the browser.",
		"Workers AI runs inference on Cloudflare's global network.",
	],
	topN: 2,
});

// results is sorted by relevance score

AI Search

AI Search is Cloudflare's managed RAG service. Connect your data and query it with natural language.

// wrangler.jsonc
{
	"ai_search": [{ "binding": "AI_SEARCH", "name": "my-search-index" }],
}
import { createAISearch } from "workers-ai-provider";
import { generateText } from "ai";

const aisearch = createAISearch({ binding: env.AI_SEARCH });

const { text } = await generateText({
	model: aisearch(),
	messages: [{ role: "user", content: "How do I setup AI Gateway?" }],
});

Streaming works the same way — use streamText instead of generateText.

createAutoRAG still works but is deprecated. Use createAISearch instead.

API Reference

createWorkersAI(options)

Option Type Description
binding Ai Workers AI binding (env.AI). Use this OR credentials.
accountId string Cloudflare account ID. Required with apiKey.
apiKey string Cloudflare API token. Required with accountId.
gateway GatewayOptions Optional AI Gateway config.

Returns a provider with model factories. Each factory accepts an optional second argument for per-model settings:

workersai("@cf/moonshotai/kimi-k2.5", {
	sessionAffinity: "my-unique-session-id",
});
Setting Type Description
safePrompt boolean Inject a safety prompt before all conversations.
sessionAffinity string Routes requests with the same key to the same backend replica for prefix-cache optimization.

Model factories:

// Chat — for generateText / streamText
workersai(modelId);
workersai.chat(modelId);

// Embeddings — for embedMany / embed
workersai.textEmbedding(modelId);

// Images — for generateImage
workersai.image(modelId);

// Transcription — for transcribe
workersai.transcription(modelId, settings?);

// Text-to-Speech — for speech
workersai.speech(modelId);

// Reranking — for rerank
workersai.reranking(modelId);

createAISearch(options)

Option Type Description
binding AutoRAG AI Search binding (env.AI_SEARCH).

Returns a callable provider:

aisearch(); // AI Search model (shorthand)
aisearch.chat(); // AI Search model