Skip to content

Commit e9b2a9a

Browse files
committed
fix(workers-ai-provider): forward reasoning_effort and chat_template_kwargs (#501)
`modelSettings` passed to the provider were flowing through `getRunOptions()` into the 3rd arg (options) of `binding.run(model, inputs, options)`, but Cloudflare Workers AI's `reasoning_effort` and `chat_template_kwargs` parameters belong on the 2nd arg (inputs). As a result they were silently dropped, causing reasoning models (GLM-4.7-flash, Kimi K2.5/K2.6, GPT-OSS, QwQ) to burn the entire output token budget on chain-of-thought. - Type `reasoning_effort` and `chat_template_kwargs` directly on `WorkersAIChatSettings`. - In `buildRunInputs()`, pull both values from settings and from `providerOptions["workers-ai"]` (per-call wins) and place them on the inputs object. `reasoning_effort: null` is preserved (`!== undefined` check) because it's the explicit "disable reasoning" signal. - In `getRunOptions()`, strip them from `passthroughOptions` so they don't leak into the binding's options arg or the REST URL query string. - Wire `options.providerOptions` through `doGenerate` and `doStream` so per-call overrides work without settings. Adds 11 tests covering binding inputs placement, REST body placement, null preservation, no leakage into options/query, per-call override, and unrelated settings passthrough (no regression). Closes #501. Made-with: Cursor
1 parent 0ba4637 commit e9b2a9a

5 files changed

Lines changed: 383 additions & 3 deletions

File tree

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
"workers-ai-provider": minor
3+
---
4+
5+
Forward `reasoning_effort` and `chat_template_kwargs` onto `binding.run(model, inputs)`'s `inputs` object instead of silently dropping them into the options arg / REST query string. This fixes reasoning models (GLM-4.7-flash, Kimi K2.5/K2.6, GPT-OSS, QwQ) burning the entire output token budget on chain-of-thought with no visible content.
6+
7+
Both settings-level and per-call usage are supported:
8+
9+
```ts
10+
// Settings-level
11+
const model = workersai("@cf/zai-org/glm-4.7-flash", {
12+
reasoning_effort: "low",
13+
chat_template_kwargs: { enable_thinking: false },
14+
});
15+
16+
// Per-call (overrides settings)
17+
await generateText({
18+
model,
19+
prompt,
20+
providerOptions: {
21+
"workers-ai": { reasoning_effort: "low" },
22+
},
23+
});
24+
```
25+
26+
`reasoning_effort: null` is preserved as-is (explicit "disable reasoning" signal). The two fields are also typed directly on `WorkersAIChatSettings`.
27+
28+
Closes #501.

packages/workers-ai-provider/README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,35 @@ for await (const chunk of result.textStream) {
112112
}
113113
```
114114

115+
## Reasoning Controls
116+
117+
Reasoning-capable Workers AI models (GLM-4.7-flash, Kimi K2.5/K2.6, GPT-OSS, QwQ) accept `reasoning_effort` and `chat_template_kwargs` on their inputs. Either set them at model creation time as settings, or per-call via `providerOptions["workers-ai"]` (per-call wins):
118+
119+
```ts
120+
// Settings-level (applies to every request on this model instance)
121+
const model = workersai("@cf/zai-org/glm-4.7-flash", {
122+
reasoning_effort: "low", // "low" | "medium" | "high" | null
123+
chat_template_kwargs: { enable_thinking: false },
124+
});
125+
126+
await generateText({ model, prompt: "Summarize in one sentence." });
127+
```
128+
129+
```ts
130+
// Per-call (overrides any settings-level value)
131+
const model = workersai("@cf/zai-org/glm-4.7-flash");
132+
133+
await generateText({
134+
model,
135+
prompt: "Summarize in one sentence.",
136+
providerOptions: {
137+
"workers-ai": { reasoning_effort: "low" },
138+
},
139+
});
140+
```
141+
142+
`reasoning_effort: null` is meaningful — it's the explicit "disable reasoning" signal for models that support it. Both fields land on the `inputs` object of `binding.run()` (and the JSON body of the REST request), matching the shape expected by Workers AI. See the [model catalog](https://developers.cloudflare.com/workers-ai/models/) for per-model reasoning capabilities.
143+
115144
## Vision (Image Inputs)
116145

117146
Send images to vision-capable models like Kimi K2.5:

packages/workers-ai-provider/src/workersai-chat-language-model.ts

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -123,12 +123,30 @@ export class WorkersAIChatLanguageModel implements LanguageModelV3 {
123123
* accept this format at runtime.
124124
*
125125
* The binding path additionally normalises null content to empty strings.
126+
*
127+
* Reasoning controls (`reasoning_effort`, `chat_template_kwargs`) are
128+
* forwarded here from settings. These belong on the INPUTS object, not on
129+
* the 3rd-arg options / REST query string — see
130+
* https://github.com/cloudflare/ai/issues/501. Per-call values from
131+
* `providerOptions["workers-ai"]` override settings.
132+
*
133+
* `reasoning_effort: null` is a valid value ("disable reasoning"), so we
134+
* check `!== undefined` rather than truthiness.
126135
*/
127136
private buildRunInputs(
128137
args: ReturnType<typeof this.getArgs>["args"],
129138
messages: ReturnType<typeof convertToWorkersAIChatMessages>["messages"],
130-
options?: { stream?: boolean },
139+
options?: { stream?: boolean; providerOptions?: Record<string, unknown> },
131140
) {
141+
const perCall =
142+
(options?.providerOptions?.["workers-ai"] as Record<string, unknown> | undefined) ?? {};
143+
const reasoningEffort =
144+
"reasoning_effort" in perCall ? perCall.reasoning_effort : this.settings.reasoning_effort;
145+
const chatTemplateKwargs =
146+
"chat_template_kwargs" in perCall
147+
? perCall.chat_template_kwargs
148+
: this.settings.chat_template_kwargs;
149+
132150
return {
133151
max_tokens: args.max_tokens,
134152
messages: this.config.isBinding ? normalizeMessagesForBinding(messages) : messages,
@@ -138,18 +156,28 @@ export class WorkersAIChatLanguageModel implements LanguageModelV3 {
138156
top_p: args.top_p,
139157
...(args.response_format ? { response_format: args.response_format } : {}),
140158
...(options?.stream ? { stream: true } : {}),
159+
...(reasoningEffort !== undefined ? { reasoning_effort: reasoningEffort } : {}),
160+
...(chatTemplateKwargs !== undefined
161+
? { chat_template_kwargs: chatTemplateKwargs }
162+
: {}),
141163
};
142164
}
143165

144166
/**
145167
* Get passthrough options for binding.run() from settings.
168+
*
169+
* `reasoning_effort` and `chat_template_kwargs` are explicitly excluded
170+
* here — they belong on the `inputs` object (see `buildRunInputs`), not on
171+
* the `options` (3rd) arg of binding.run() or the REST query string.
146172
*/
147173
private getRunOptions() {
148174
const {
149175
gateway,
150176
safePrompt: _safePrompt,
151177
sessionAffinity,
152178
extraHeaders,
179+
reasoning_effort: _reasoningEffort,
180+
chat_template_kwargs: _chatTemplateKwargs,
153181
...passthroughOptions
154182
} = this.settings;
155183

@@ -173,7 +201,9 @@ export class WorkersAIChatLanguageModel implements LanguageModelV3 {
173201
const { args, warnings } = this.getArgs(options);
174202
const { messages } = convertToWorkersAIChatMessages(options.prompt);
175203

176-
const inputs = this.buildRunInputs(args, messages);
204+
const inputs = this.buildRunInputs(args, messages, {
205+
providerOptions: options.providerOptions,
206+
});
177207
const runOptions = this.getRunOptions();
178208

179209
const output = await this.config.binding.run(
@@ -223,7 +253,10 @@ export class WorkersAIChatLanguageModel implements LanguageModelV3 {
223253
const { args, warnings } = this.getArgs(options);
224254
const { messages } = convertToWorkersAIChatMessages(options.prompt);
225255

226-
const inputs = this.buildRunInputs(args, messages, { stream: true });
256+
const inputs = this.buildRunInputs(args, messages, {
257+
stream: true,
258+
providerOptions: options.providerOptions,
259+
});
227260
const runOptions = this.getRunOptions();
228261

229262
const response = await this.config.binding.run(

packages/workers-ai-provider/src/workersai-chat-settings.ts

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,29 @@ export type WorkersAIChatSettings = {
1616
*/
1717
sessionAffinity?: string;
1818

19+
/**
20+
* Controls the reasoning budget for reasoning-capable Workers AI models
21+
* (e.g. `@cf/zai-org/glm-4.7-flash`, `@cf/moonshotai/kimi-k2.5`,
22+
* `@cf/openai/gpt-oss-120b`).
23+
*
24+
* `null` is a valid value and disables reasoning for models that support it.
25+
* Forwarded on the `inputs` object of `binding.run(model, inputs)`.
26+
*/
27+
reasoning_effort?: "low" | "medium" | "high" | null;
28+
29+
/**
30+
* Chat-template overrides for reasoning-capable models that expose
31+
* thinking toggles (e.g. GLM, Kimi).
32+
*
33+
* Forwarded on the `inputs` object of `binding.run(model, inputs)`.
34+
*/
35+
chat_template_kwargs?: {
36+
/** Whether to enable reasoning. Enabled by default on reasoning models. */
37+
enable_thinking?: boolean;
38+
/** If false, preserves reasoning context between turns. */
39+
clear_thinking?: boolean;
40+
};
41+
1942
/**
2043
* Passthrough settings that are provided directly to the run function.
2144
* Use this for any provider-specific options not covered by the typed fields.

0 commit comments

Comments
 (0)