A local proxy server that caches LLM API calls to save you money during agent development.
When building AI agents, you run the same prompts thousands of times during testing. That burns through API credits fast. cache-llm intercepts your LLM requests, caches responses in a local SQLite database, and returns them in <2ms on repeat calls — your API bill shrinks to near zero during local development.
- ⚡
<2msresponse time on cache hits - 💾 SQLite-backed — zero external dependencies
- 🔌 Drop-in compatible with OpenAI SDK, LangChain, AutoGen, and any OpenAI-compatible client
- 🔒 Deterministic
sha256hashing — same prompt always hits the same cache entry
npx @dinakars777/cache-llmStarts the proxy on http://localhost:8080 targeting https://api.openai.com.
| Flag | Description | Default |
|---|---|---|
-p, --port |
Port to run the proxy on | 8080 |
-t, --target |
Target LLM API base URL | https://api.openai.com |
-d, --db |
SQLite database file path | ./.llm-cache.db |
Point your client's baseURL at the proxy:
// OpenAI Node.js SDK
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: 'http://localhost:8080/v1',
});# LangChain, AutoGen, etc.
export OPENAI_BASE_URL="http://localhost:8080/v1"- Computes a
sha256hash of the method, URL path, auth header, and request body - Returns the cached response instantly on a hit
- On a miss, forwards to the real API, stores the response, then returns it
| Package | Purpose |
|---|---|
better-sqlite3 |
Fast local SQLite caching |
express |
Proxy server |
| TypeScript | Type-safe implementation |
MIT