Deterministic Caching #55

anerli · 2025-06-27T07:09:52Z

anerli
Jun 27, 2025
Maintainer

Starting point for discussions on approaches for implementing action caching - ability for the LLM to save a workflow or test and re-run it again in the future the same way for much cheaper.

anerli · 2025-06-27T07:18:36Z

anerli
Jun 27, 2025
Maintainer Author

Three main approaches in mind for me so far:

(1) Codegen - generate playwright or something

Why I'm not a big fan: Feels like a hack - playwright code is not aligned with the best way the LLM takes actions. It would be better if the LLM took actions, and these actions, as taken by the LLM are cached.
Tricky to adapt nicely
LLM is not actually that good at writing playwright - more intuitive for it to take actions in an LLM-oriented action space

(2) LLM-assisted - run a cached workflow/test with a smaller LLM based on a plan already executed by a bigger one

Relatively sensible and fairly clear how you might implement
Problem: might get some weird inconsistencies in workflows where the capabilities of the model affects behavior at runtime even on repeated runs
Requires users to set up a second smaller model

(3) Embeddings - "trajectory shift detection"

Potentially the most robust and cost-efficient
Use embedding models to capture signals representing triggers that indicate a consistent or changed action path - use these to kick out to llm if needed
Potentially combine with small model approach

Probably a good way to look at this problem is as two key components:
(1) Lookup mechanism

Given an agents task and current state, how do we know that this should be treated the same as a task that has been repeated in the past?

(2) Adaptation trigger

Given an agent re-executing an existing cached path (using some whatever method) - how to know when it goes "off" the path and needs kick out to a strong LLM?

0 replies

luke3butler · 2025-06-28T06:10:04Z

luke3butler
Jun 28, 2025

Here's a rough draft that covers a potential approach. I may have glossed over something that is more complicated than I'm thinking, so definitely call out anything that seems off.

Addressing the Core Components

Visual similarity (pHash or other visual similarity metric)
Structural equality (domChecksum)

Caching is used only when both validations succeed. If either check fails, the system deems the entry stale and falls back to the LLM. The domNormalizers pipeline runs first to strip volatile bits before computing domChecksum.

High-Level Architecture

The system consists of three layers:

Layer	Component	Responsibility
Core	`CacheManager`	Orchestrates all cache lookups, writes, and invalidation logic
Pluggable	`CacheProvider` Interface	Defines standard interface (`get`, `set`, `del`) for storage backends
Runtime	`Agent`	Delegates all caching operations to the `CacheManager`

Two-Tier Cache Lookup:

Read Path: GET key → Try L1 (In-Memory) → If miss, Try L2 (Shared Provider) → If L2 hit, write back to L1 → Return result
Write Path: SET key → Write to L2 → On success, write to L1
Invalidation Path: DEL key → Delete from L2 → On success, delete from L1

Integration Touch-Points

Area (file)	Hook timing	Cache surface
`src/agent/plan.ts`	Before calling LLM to generate `TargetedAction[]`	Plan Cache
`src/agent/step.ts`	After resolving coordinates, before DOM click	Step Cache
`src/agent/extract.ts`	Before `extractLLM()` executes	Extract Cache
`src/model/ModelHarness.ts`	Wraps `query()` method	Query Cache
`CacheManager.buildDomChecksum()`	Runs DOM normalizer pipeline incl. `llmNormalizer`	Key/Checksum

Cached Surfaces & Key Generation

Plan Cache: Caches the high-level TargetedAction[] sequence for an act() command. Key: SHA256({instruction, dataHash, url, domChecksum})
Execution Step Cache: Caches the precise, successful parameters for a single action. Key: ActionSignature object
Extraction & Query Cache: Caches the structured data result from extract() or query(). Key: SHA256({instruction, schemaHash, url, contextHash})

Key Generation Details:

Deterministic stringify + SHA-256
Salted with library version, LLM model, viewport fingerprint and the llmNormHash (SHA-1 of the LLM-generated pattern list) so updated normalization rules automatically bust stale entries

The DOM Normalization Pipeline

To combat UI noise (timestamps, ads, dynamic IDs), the domChecksum is generated from a normalized DOM string. The agent is configurable with a pipeline of normalizer functions:

// Example configuration
startBrowserAgent({
  caching: {
    domNormalizers: [stripTimestamps, stripDataReactIds, /* custom user normalizer */]
  }
});

Default normalizers strip common volatile elements. Users can provide custom normalizer functions.

LLM-Assisted Normalization:

On first encounter with a page, raw DOM is sent to LLM which returns CSS selectors/regex patterns for volatile elements (timestamps, counters, A/B-test IDs). Output (≤100 selectors, each ≤256 chars) is hashed and stored under llmNorm:v1:<origin>
Patterns are compiled into lightweight JS filter and executed locally on subsequent runs
Opt-in via domNormalizers: [stripTimestamps, llmNormalizer]

Validation & Failure Strategy

Hit Criteria: A cache hit requires both a soft visual match (pHash) and a hard structural match (domChecksum)
Failure: On cache miss or execution error, agent falls back to live LLM. Failure during replay invalidates only that specific cache entry

Invalidation Rules

Trigger	Scope of Cache Cleared
`agent.nav()`	All entries for the previous URL origin
Direct Playwright mutation (e.g., `agent.page.click()`)	All entries for the current URL origin
Manual `agent.clearCache()`	All entries (or user-defined scope)

2 replies

anerli Jun 28, 2025
Maintainer Author

Hey Luke, appreciate the thinking you did on this here. I think some kind of visual check + DOM structural check could be a reasonable approach for invalidating a cache.

I think there's also kind of progressively difficult versions of this problem with different sets of assumptions:

Problem 0 (trivial)

Assumptions:

Environment doesn't change
For a given task, task parameters don't change

Goal:

Run first time with LLM, and afterwards use the exact same actions taken

Solution:

Just save the actions taken for a given task. When the task is seen again, repeat the exact actions.

Problem 1b (medium)

Assumptions:

Environment changes arbitrarily
For a given task, task parameters don't change

Goal:

Each task should run the same way if environment doesn't change. If the environment changes in a way that would impact the task, the agent should be able to identify this and then adapt (deterministic until it needs to be nondeterministic/generative again)

Solution:

Involves some mechanism to detect when to adapt (maybe with a smaller LLM, maybe with hard checks, ...), and how to adapt (using LLM)

Problem 1b (hard)

Assumptions:

Environment doesn't change
For a given task, its parameters may change, and reparametrized tasks should share a cache
- Here is where ambiguity starts - what should be considered the "same" task? What should be considered a re-parametrization of an existing task?

Goal:

Every parameterization of the same task should follow the same execution pattern (pretty ambiguous here)

I think the ideal, generalizable, and seamless solution would make no assumptions about environment stability and would account for the causal interactions of parameters of a task and how it affects actions in a repeatable sequence.

luke3butler Jun 29, 2025

Thanks for reading it, and taking the time to respond. I'll be diving into this further when I have some time, unless someone else beats me to it.

I have some ideas for solutions here, but I'm going to hold back until I can give them some more thought.

Part of me wonders if the solution should be modular, using different strategies for different task types (which could also make it easer to iterate, replace, or let the user provide their own strategy), but that could add unnecessary complexity and overhead depending on what the "hard" strategy ends up being.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deterministic Caching #55

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Deterministic Caching #55

Uh oh!

anerli Jun 27, 2025 Maintainer

Replies: 2 comments · 2 replies

Uh oh!

Uh oh!

anerli Jun 27, 2025 Maintainer Author

Uh oh!

luke3butler Jun 28, 2025

Addressing the Core Components

High-Level Architecture

Integration Touch-Points

Cached Surfaces & Key Generation

The DOM Normalization Pipeline

Validation & Failure Strategy

Invalidation Rules

Uh oh!

anerli Jun 28, 2025 Maintainer Author

Problem 0 (trivial)

Problem 1b (medium)

Problem 1b (hard)

Uh oh!

luke3butler Jun 29, 2025

anerli
Jun 27, 2025
Maintainer

Replies: 2 comments 2 replies

anerli
Jun 27, 2025
Maintainer Author

luke3butler
Jun 28, 2025

anerli Jun 28, 2025
Maintainer Author