Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/src/api/class-browser.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ await browser.CloseAsync();
### option: Browser.newContext.storageStatePath = %%-csharp-java-context-option-storage-state-path-%%
* since: v1.9

### option: Browser.newContext.agent = %%-js-context-option-agent-%%
### option: Browser.newContext.agent = %%-context-option-agent-%%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep it js-only for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

* since: v1.58

## async method: Browser.newPage
Expand Down
32 changes: 27 additions & 5 deletions docs/src/api/class-page.md
Original file line number Diff line number Diff line change
Expand Up @@ -2026,8 +2026,12 @@ Callback function which will be called in Playwright's context.

## async method: Page.extract
* since: v1.58
* langs: js
- returns: <[any]>
- returns: <[Object]>
- `result` <[any]>
- `usage` <[Object]>
- `turns` <[int]>
- `inputTokens` <[int]>
- `outputTokens` <[int]>

Extract information from the page using the agentic loop, return it in a given Zod format.

Expand All @@ -2050,11 +2054,18 @@ Task to perform using agentic loop.
* since: v1.58
- `schema` <[z.ZodSchema]>

### option: Page.extract.maxTokens
* since: v1.58
- `maxTokens` <[int]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not separate number for input/output? They usually have different price.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how completion APIs allow controlling them.


Maximum number of tokens to consume. The agentic loop will stop after input + output tokens exceed this value.
Defaults to context-wide value specified in `agent` property.

### option: Page.extract.maxTurns
* since: v1.58
- `maxTurns` <[int]>

Maximum number of agentic steps to take while extracting the information.
Maximum number of agentic turns during this call, defaults to context-wide value specified in `agent` property.

## async method: Page.fill
* since: v1.8
Expand Down Expand Up @@ -3031,7 +3042,11 @@ Whether or not to embed the document outline into the PDF. Defaults to `false`.

## async method: Page.perform
* since: v1.58
* langs: js
- returns: <[Object]>
- `usage` <[Object]>
- `turns` <[int]>
- `inputTokens` <[int]>
- `outputTokens` <[int]>

Perform action using agentic loop.

Expand All @@ -3054,11 +3069,18 @@ Task to perform using agentic loop.
All the agentic actions are converted to the Playwright calls and are cached.
By default, they are cached globally with the `task` as a key. This option allows controlling the cache key explicitly.

### option: Page.perform.maxTokens
* since: v1.58
- `maxTokens` <[int]>

Maximum number of tokens to consume. The agentic loop will stop after input + output tokens exceed this value.
Defaults to context-wide value specified in `agent` property.

### option: Page.perform.maxTurns
* since: v1.58
- `maxTurns` <[int]>

Maximum number of agentic steps to take while performing this action.
Maximum number of agentic turns during this call, defaults to context-wide value specified in `agent` property.


## async method: Page.press
Expand Down
5 changes: 3 additions & 2 deletions docs/src/api/params.md
Original file line number Diff line number Diff line change
Expand Up @@ -370,14 +370,15 @@ It makes the execution of the tests non-deterministic.
Emulates consistent window screen size available inside web page via `window.screen`. Is only used when the
[`option: viewport`] is set.

## js-context-option-agent
* langs: js
## context-option-agent
- `agent` <[Object]>
- `provider` <[string]> LLM provider to use.
- `model` <[string]> Model identifier within provider.
- `cacheFile` ?<[string]> Cache file to use/generate code for performed actions into. Cache is not used if not specified (default).
- `cacheMode` ?<['force'|'ignore'|'auto']> Cache control, defaults to 'auto'.
- `secrets` ?<[Object]<[string], [string]>> Secrets to hide from the LLM.
- `maxTurns` ?<[int]> Maximum number of agentic turns to take per call. Defaults to 10.
- `maxTokens` ?<[int]> Maximum number of tokens to consume per call. The agentic loop will stop after input + output tokens exceed this value. Defaults on unlimited.

Agent settings for [`method: Page.perform`] and [`method: Page.extract`].

Expand Down
2 changes: 1 addition & 1 deletion docs/src/test-api/class-testoptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ export default defineConfig({
});
```

## property: TestOptions.agent = %%-js-context-option-agent-%%
## property: TestOptions.agent = %%-context-option-agent-%%
* since: v1.58


Expand Down
29 changes: 27 additions & 2 deletions packages/playwright-client/types/types.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3839,10 +3839,24 @@ export interface Page {
key?: string;

/**
* Maximum number of agentic steps to take while performing this action.
* Maximum number of tokens to consume. The agentic loop will stop after input + output tokens exceed this value.
* Defaults to context-wide value specified in `agent` property.
*/
maxTokens?: number;

/**
* Maximum number of agentic turns during this call, defaults to context-wide value specified in `agent` property.
*/
maxTurns?: number;
}): Promise<void>;
}): Promise<{
usage: {
turns: number;

inputTokens: number;

outputTokens: number;
};
}>;

/**
* **NOTE** Use locator-based [locator.press(key[, options])](https://playwright.dev/docs/api/class-locator#locator-press)
Expand Down Expand Up @@ -22110,6 +22124,17 @@ export interface BrowserContextOptions {
* Secrets to hide from the LLM.
*/
secrets?: { [key: string]: string; };

/**
* Maximum number of agentic turns to take per call. Defaults to 10.
*/
maxTurns?: number;

/**
* Maximum number of tokens to consume per call. The agentic loop will stop after input + output tokens exceed this
* value. Defaults on unlimited.
*/
maxTokens?: number;
};

/**
Expand Down
11 changes: 6 additions & 5 deletions packages/playwright-core/src/client/page.ts
Original file line number Diff line number Diff line change
Expand Up @@ -846,13 +846,14 @@ export class Page extends ChannelOwner<channels.PageChannel> implements api.Page
return result.pdf;
}

async perform(task: string, options: { key?: string, maxTurns?: number } = {}): Promise<void> {
await this._channel.perform({ task, ...options });
async perform(task: string, options: { key?: string, maxTokens?: number, maxTurns?: number } = {}) {
const result = await this._channel.perform({ task, ...options });
return { usage: { ...result } };
}

async extract<Schema extends z.ZodTypeAny>(query: string, schema: Schema, options: { maxTurns?: number } = {}): Promise<z.infer<Schema>> {
const { result } = await this._channel.extract({ query, schema: this._platform.zodToJsonSchema(schema), ...options });
return result;
async extract<Schema extends z.ZodTypeAny>(query: string, schema: Schema, options: { maxTokens?: number, maxTurns?: number } = {}): Promise<z.infer<Schema>> {
const { result, ...usage } = await this._channel.extract({ query, schema: this._platform.zodToJsonSchema(schema), ...options });
return { result, usage };
}

async _snapshotForAI(options: TimeoutOptions & { track?: string } = {}): Promise<{ full: string, incremental?: string }> {
Expand Down
20 changes: 19 additions & 1 deletion packages/playwright-core/src/protocol/validator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -608,6 +608,8 @@ scheme.BrowserTypeLaunchPersistentContextParams = tObject({
cacheFile: tOptional(tString),
cacheMode: tOptional(tEnum(['ignore', 'force', 'auto'])),
secrets: tOptional(tArray(tType('NameValue'))),
maxTurns: tOptional(tInt),
maxTokens: tOptional(tInt),
})),
userDataDir: tString,
slowMo: tOptional(tFloat),
Expand Down Expand Up @@ -707,6 +709,8 @@ scheme.BrowserNewContextParams = tObject({
cacheFile: tOptional(tString),
cacheMode: tOptional(tEnum(['ignore', 'force', 'auto'])),
secrets: tOptional(tArray(tType('NameValue'))),
maxTurns: tOptional(tInt),
maxTokens: tOptional(tInt),
})),
proxy: tOptional(tObject({
server: tString,
Expand Down Expand Up @@ -785,6 +789,8 @@ scheme.BrowserNewContextForReuseParams = tObject({
cacheFile: tOptional(tString),
cacheMode: tOptional(tEnum(['ignore', 'force', 'auto'])),
secrets: tOptional(tArray(tType('NameValue'))),
maxTurns: tOptional(tInt),
maxTokens: tOptional(tInt),
})),
proxy: tOptional(tObject({
server: tString,
Expand Down Expand Up @@ -908,6 +914,8 @@ scheme.BrowserContextInitializer = tObject({
cacheFile: tOptional(tString),
cacheMode: tOptional(tEnum(['ignore', 'force', 'auto'])),
secrets: tOptional(tArray(tType('NameValue'))),
maxTurns: tOptional(tInt),
maxTokens: tOptional(tInt),
})),
}),
});
Expand Down Expand Up @@ -1514,15 +1522,23 @@ scheme.PagePerformParams = tObject({
task: tString,
key: tOptional(tString),
maxTurns: tOptional(tInt),
maxTokens: tOptional(tInt),
});
scheme.PagePerformResult = tObject({
turns: tInt,
inputTokens: tInt,
outputTokens: tInt,
});
scheme.PagePerformResult = tOptional(tObject({}));
scheme.PageExtractParams = tObject({
query: tString,
schema: tAny,
maxTurns: tOptional(tInt),
});
scheme.PageExtractResult = tObject({
result: tAny,
turns: tInt,
inputTokens: tInt,
outputTokens: tInt,
});
scheme.FrameInitializer = tObject({
url: tString,
Expand Down Expand Up @@ -2818,6 +2834,8 @@ scheme.AndroidDeviceLaunchBrowserParams = tObject({
cacheFile: tOptional(tString),
cacheMode: tOptional(tEnum(['ignore', 'force', 'auto'])),
secrets: tOptional(tArray(tType('NameValue'))),
maxTurns: tOptional(tInt),
maxTokens: tOptional(tInt),
})),
pkg: tOptional(tString),
args: tOptional(tArray(tString)),
Expand Down
41 changes: 33 additions & 8 deletions packages/playwright-core/src/server/agent/agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,28 +28,42 @@ import type { Page } from '../page';
import type * as loopTypes from '@lowire/loop';
import type * as actions from './actions';

export async function pagePerform(progress: Progress, page: Page, options: channels.PagePerformParams): Promise<void> {
type Usage = {
turns: number,
inputTokens: number,
outputTokens: number,
};

export async function pagePerform(progress: Progress, page: Page, options: channels.PagePerformParams): Promise<Usage> {
const context = new Context(progress, page);

if (await cachedPerform(context, options))
return;
return { turns: 0, inputTokens: 0, outputTokens: 0 };

await perform(context, options.task, undefined, options);
const { usage } = await perform(context, options.task, undefined, options);
await updateCache(context, options);
return usage;
}

export async function pageExtract(progress: Progress, page: Page, options: channels.PageExtractParams) {
export async function pageExtract(progress: Progress, page: Page, options: channels.PageExtractParams): Promise<{
result: any,
usage: Usage
}> {
const context = new Context(progress, page);
const task = `
### Instructions
Extract the following information from the page. Do not perform any actions, just extract the information.

### Query
${options.query}`;
return await perform(context, task, options.schema, options);
const { result, usage } = await perform(context, task, options.schema, options);
return { result, usage };
}

async function perform(context: Context, userTask: string, resultSchema: loopTypes.Schema | undefined, options: { maxTurns?: number } = {}): Promise<any> {
async function perform(context: Context, userTask: string, resultSchema: loopTypes.Schema | undefined, options: { maxTurns?: number, maxTokens?: number } = {}): Promise<{
result: any,
usage: Usage
}> {
const { progress, page } = context;
const browserContext = page.browserContext;
if (!browserContext._options.agent)
Expand All @@ -58,13 +72,17 @@ async function perform(context: Context, userTask: string, resultSchema: loopTyp
const { full } = await page.snapshotForAI(progress);
const { tools, callTool } = toolsForLoop(context);

const limits = context.limits(options);
let turns = 0;
const loop = new Loop(browserContext._options.agent.provider as any, {
model: browserContext._options.agent.model,
summarize: true,
debug,
callTool,
tools,
...limits,
beforeTurn: params => {
++turns;
const lastReply = params.conversation.messages.findLast(m => m.role === 'assistant');
const toolCall = lastReply?.content.find(c => c.type === 'tool_call');
if (!resultSchema && toolCall && toolCall.arguments.thatShouldBeIt)
Expand All @@ -80,8 +98,15 @@ async function perform(context: Context, userTask: string, resultSchema: loopTyp
${full}
`;

const { result } = await loop.run(task, { resultSchema });
return result;
const { result, usage } = await loop.run(task, { resultSchema });
return {
result,
usage: {
turns,
inputTokens: usage.input,
outputTokens: usage.output,
}
};
}

type CachedActions = Record<string, {
Expand Down
7 changes: 7 additions & 0 deletions packages/playwright-core/src/server/agent/context.ts
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,13 @@ export class Context {
}));
}

limits(options: { maxTurns?: number, maxTokens?: number } = {}): { maxTurns: number | undefined, maxTokens: number | undefined } {
return {
maxTurns: options.maxTurns ?? this.options?.maxTurns ?? 10,
maxTokens: options.maxTokens ?? this.options?.maxTokens ?? undefined,
};
}

private _redactText(text: string): string {
const secrets = this.options?.secrets;
if (!secrets)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -322,11 +322,12 @@ export class PageDispatcher extends Dispatcher<Page, channels.PageChannel, Brows
}

async perform(params: channels.PagePerformParams, progress: Progress): Promise<channels.PagePerformResult> {
await pagePerform(progress, this._page, params);
return await pagePerform(progress, this._page, params);
}

async extract(params: channels.PageExtractParams, progress: Progress): Promise<channels.PageExtractResult> {
return { result: await pageExtract(progress, this._page, params) };
const { result, usage } = await pageExtract(progress, this._page, params);
return { result, ...usage };
}

async requests(params: channels.PageRequestsParams, progress: Progress): Promise<channels.PageRequestsResult> {
Expand Down
29 changes: 27 additions & 2 deletions packages/playwright-core/types/types.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3839,10 +3839,24 @@ export interface Page {
key?: string;

/**
* Maximum number of agentic steps to take while performing this action.
* Maximum number of tokens to consume. The agentic loop will stop after input + output tokens exceed this value.
* Defaults to context-wide value specified in `agent` property.
*/
maxTokens?: number;

/**
* Maximum number of agentic turns during this call, defaults to context-wide value specified in `agent` property.
*/
maxTurns?: number;
}): Promise<void>;
}): Promise<{
usage: {
turns: number;

inputTokens: number;

outputTokens: number;
};
}>;

/**
* **NOTE** Use locator-based [locator.press(key[, options])](https://playwright.dev/docs/api/class-locator#locator-press)
Expand Down Expand Up @@ -22110,6 +22124,17 @@ export interface BrowserContextOptions {
* Secrets to hide from the LLM.
*/
secrets?: { [key: string]: string; };

/**
* Maximum number of agentic turns to take per call. Defaults to 10.
*/
maxTurns?: number;

/**
* Maximum number of tokens to consume per call. The agentic loop will stop after input + output tokens exceed this
* value. Defaults on unlimited.
*/
maxTokens?: number;
};

/**
Expand Down
Loading
Loading