chore: split CLAUDE.md by domain for lazy context loading

ryan-crabbe-berri · ryan-crabbe-berri · commit 30a55412917e · 2026-04-18T15:24:39.000-07:00
Move domain-specific rules from the root CLAUDE.md into nested
CLAUDE.md files so Claude Code only loads them when working in the
relevant subdirectory (per the documented hierarchical memory
behavior). Root keeps universal rules (commands, architecture, style,
testing). No rule wording changed.

Also adds empty stubs for proxy sub-domains (auth, guardrails, hooks,
spend_tracking, pass_through_endpoints) so future rules land in the
right place.
diff --git a/.circleci/CLAUDE.md b/.circleci/CLAUDE.md
@@ -0,0 +1,11 @@
+# CI (`.circleci/`)
+
+CircleCI pipeline configuration.
+
+## CI Supply-Chain Safety
+- **Never pipe a remote script into a shell** (`curl ... | bash`, `wget ... | sh`). Download the artifact to a file, verify its SHA-256 checksum, then install.
+- **Pin every external tool to a specific version** with a full URL (not `latest` or `stable`). Unversioned downloads silently change under you.
+- **Verify checksums for all downloaded binaries.** Use the provider's official `.sha256` / `.sha256sum` sidecar file when available; otherwise compute and hardcode the digest.
+- **Prefer reusable CircleCI commands** (`commands:` section) so a tool is installed and verified in exactly one place, then referenced everywhere with `- install_<tool>` or `- wait_for_service`.
+- **Don't add tools just because they were there before.** Audit whether an external dependency is still needed. If it can be replaced with a shell one-liner or a tool already in the image, remove it.
+- These rules apply to every download in CI: binaries, install scripts, language version managers, package repos. No exceptions.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -2,6 +2,21 @@
 
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 
+## Nested CLAUDE.md files
+
+Domain-specific rules live next to the code they apply to. They load automatically when Claude reads files in those directories:
+
+- `ui/litellm-dashboard/CLAUDE.md` — UI dashboard (antd, browser storage, UI↔backend contracts)
+- `litellm/proxy/CLAUDE.md` — proxy server (DB access, migrations, schema sync)
+- `litellm/proxy/management_endpoints/CLAUDE.md` — admin CRUD endpoints (MCP credential storage, MCP OAuth backend)
+- `litellm/proxy/auth/CLAUDE.md` — auth (stub)
+- `litellm/proxy/guardrails/CLAUDE.md` — guardrails (stub)
+- `litellm/proxy/hooks/CLAUDE.md` — proxy hooks (stub)
+- `litellm/proxy/spend_tracking/CLAUDE.md` — spend tracking (stub)
+- `litellm/proxy/pass_through_endpoints/CLAUDE.md` — pass-through endpoints (stub)
+- `litellm/caching/CLAUDE.md` — caching (HTTP client cache safety)
+- `.circleci/CLAUDE.md` — CI supply-chain rules
+
 ## Development Commands
 
 ### Installation
@@ -106,44 +121,6 @@ LiteLLM is a unified interface for 100+ LLM providers with two main components:
 - **Keep monkeypatch stubs in sync with real signatures** — when a function gains a new optional parameter, update every `fake_*` / `stub_*` in tests that patch it to also accept that kwarg (even as `**kwargs`). Stale stubs fail with `unexpected keyword argument` and mask real bugs.
 - **Test all branches of name→ID resolution** — when adding server/resource lookup that resolves names to UUIDs, test: (1) name resolves and UUID is allowed, (2) name resolves but UUID is not allowed, (3) name does not resolve at all. The silent-fallback path is where access-control bugs hide.
 
-### UI / Backend Consistency
-- When wiring a new UI entity type to an existing backend endpoint, verify the backend API contract (single value vs. array, required vs. optional params) and ensure the UI controls match — e.g., use a single-select dropdown when the backend accepts a single value, not a multi-select
-
-### UI Component Library
-- **Always use `antd` for new UI components** — we are migrating off of `@tremor/react`. Do not introduce new `Badge`, `Text`, `Card`, `Grid`, `Title`, or other imports from `@tremor/react` in any new or modified file. Use `antd` equivalents: `Tag` for labels, plain `<span>`/`<div>` with Tailwind classes (or `Typography.Text`) for text, `Card` from `antd`, etc. Note that `antd` has no `"yellow"` Tag color — use `"gold"` for amber/yellow.
-
-### MCP OAuth / OpenAPI Transport Mapping
-- `TRANSPORT.OPENAPI` is a UI-only concept. The backend only accepts `"http"`, `"sse"`, or `"stdio"`. Always map it to `"http"` before any API call (including pre-OAuth temp-session calls).
-- FastAPI validation errors return `detail` as an array of `{loc, msg, type}` objects. Error extractors must handle: array (map `.msg`), string, nested `{error: string}`, and fallback.
-- When an MCP server already has `authorization_url` stored, skip OAuth discovery (`_discovery_metadata`) — the server URL for OpenAPI MCPs is the spec file, not the API base, and fetching it causes timeouts.
-- `client_id` should be optional in the `/authorize` endpoint — if the server has a stored `client_id` in credentials, use that. Never require callers to re-supply it.
-
-### MCP Credential Storage
-- OAuth credentials and BYOK credentials share the `litellm_mcpusercredentials` table, distinguished by a `"type"` field in the JSON payload (`"oauth2"` vs plain string).
-- When deleting OAuth credentials, check type before deleting to avoid accidentally deleting a BYOK credential for the same `(user_id, server_id)` pair.
-- Always pass the raw `expires_at` timestamp to the client — never set it to `None` for expired credentials. Let the frontend compute the "Expired" display state from the timestamp.
-- Use `RecordNotFoundError` (not bare `except Exception`) when catching "already deleted" in credential delete endpoints.
-
-### Browser Storage Safety (UI)
-- Never write LiteLLM access tokens or API keys to `localStorage` — use `sessionStorage` only. `localStorage` survives browser close and is readable by any injected script (XSS).
-- Shared utility functions (e.g. `extractErrorMessage`) belong in `src/utils/` — never define them inline in hooks or duplicate them across files.
-
-### Database Migrations
-- Prisma handles schema migrations
-- Migration files auto-generated with `prisma migrate dev`
-- Always test migrations against both PostgreSQL and SQLite
-
-### Proxy database access
-- **Do not write raw SQL** for proxy DB operations. Use Prisma model methods instead of `execute_raw` / `query_raw`.
-- Use the generated client: `prisma_client.db.<model>` (e.g. `litellm_tooltable`, `litellm_usertable`) with `.upsert()`, `.find_many()`, `.find_unique()`, `.update()`, `.update_many()` as appropriate. This avoids schema/client drift, keeps code testable with simple mocks, and matches patterns used in spend logs and other proxy code.
-- **No N+1 queries.** Never query the DB inside a loop. Batch-fetch with `{"in": ids}` and distribute in-memory.
-- **Batch writes.** Use `create_many`/`update_many`/`delete_many` instead of individual calls (these return counts only; `update_many`/`delete_many` no-op silently on missing rows). When multiple separate writes target the same table (e.g. in `batch_()`), order by primary key to avoid deadlocks.
-- **Push work to the DB.** Filter, sort, group, and aggregate in SQL, not Python. Verify Prisma generates the expected SQL — e.g. prefer `group_by` over `find_many(distinct=...)` which does client-side processing.
-- **Bound large result sets.** Prisma materializes full results in memory. For results over ~10 MB, paginate with `take`/`skip` or `cursor`/`take`, always with an explicit `order`. Prefer cursor-based pagination (`skip` is O(n)). Don't paginate naturally small result sets.
-- **Limit fetched columns on wide tables.** Use `select` to fetch only needed fields — returns a partial object, so downstream code must not access unselected fields.
-- **Check index coverage.** For new or modified queries, check `schema.prisma` for a supporting index. Prefer extending an existing index (e.g. `@@index([a])` → `@@index([a, b])`) over adding a new one, unless it's a `@@unique`. Only add indexes for large/frequent queries.
-- **Keep schema files in sync.** Apply schema changes to all `schema.prisma` copies (`schema.prisma`, `litellm/proxy/`, `litellm-proxy-extras/`, `litellm-js/spend-logs/` for SpendLogs) with a migration under `litellm-proxy-extras/litellm_proxy_extras/migrations/`.
-
 ### Setup Wizard (`litellm/setup_wizard.py`)
 - The wizard is implemented as a single `SetupWizard` class with `@staticmethod` methods — keep it that way. No module-level functions except `run_setup_wizard()` (the public entrypoint) and pure helpers (color, ANSI).
 - Use `litellm.utils.check_valid_key(model, api_key)` for credential validation — never roll a custom completion call.
@@ -154,23 +131,7 @@ LiteLLM is a unified interface for 100+ LLM providers with two main components:
 - Optional features enabled via environment variables
 - Separate licensing and authentication for enterprise features
 
-### CI Supply-Chain Safety
-- **Never pipe a remote script into a shell** (`curl ... | bash`, `wget ... | sh`). Download the artifact to a file, verify its SHA-256 checksum, then install.
-- **Pin every external tool to a specific version** with a full URL (not `latest` or `stable`). Unversioned downloads silently change under you.
-- **Verify checksums for all downloaded binaries.** Use the provider's official `.sha256` / `.sha256sum` sidecar file when available; otherwise compute and hardcode the digest.
-- **Prefer reusable CircleCI commands** (`commands:` section) so a tool is installed and verified in exactly one place, then referenced everywhere with `- install_<tool>` or `- wait_for_service`.
-- **Don't add tools just because they were there before.** Audit whether an external dependency is still needed. If it can be replaced with a shell one-liner or a tool already in the image, remove it.
-- These rules apply to every download in CI: binaries, install scripts, language version managers, package repos. No exceptions.
-
-### HTTP Client Cache Safety
-- **Never close HTTP/SDK clients on cache eviction.** `LLMClientCache._remove_key()` must not call `close()`/`aclose()` on evicted clients — they may still be used by in-flight requests. Doing so causes `RuntimeError: Cannot send a request, as the client has been closed.` after the 1-hour TTL expires. Cleanup happens at shutdown via `close_litellm_async_clients()`.
-
-### Troubleshooting: DB schema out of sync after proxy restart
-`litellm-proxy-extras` runs `prisma migrate deploy` on startup using **its own** bundled migration files, which may lag behind schema changes in the current worktree. Symptoms: `Unknown column`, `Invalid prisma invocation`, or missing data on new fields.
-
-**Diagnose:** Run `\d "TableName"` in psql and compare against `schema.prisma` — missing columns confirm the issue.
-
-**Fix options:**
-1. **Create a Prisma migration** (permanent) — run `prisma migrate dev --name <description>` in the worktree. The generated file will be picked up by `prisma migrate deploy` on next startup.
-2. **Apply manually for local dev** — `psql -d litellm -c "ALTER TABLE ... ADD COLUMN IF NOT EXISTS ..."` after each proxy start. Fine for dev, not for production.
-3. **Update litellm-proxy-extras** — if the package is installed from PyPI, its migration directory must include the new file. Either update the package or run the migration manually until the next release ships it.
+### Database Migrations
+- Prisma handles schema migrations
+- Migration files auto-generated with `prisma migrate dev`
+- Always test migrations against both PostgreSQL and SQLite
diff --git a/litellm/caching/CLAUDE.md b/litellm/caching/CLAUDE.md
@@ -0,0 +1,6 @@
+# Caching (`litellm/caching/`)
+
+Multiple cache backends (Redis, in-memory, S3, disk) + HTTP client cache.
+
+## HTTP Client Cache Safety
+- **Never close HTTP/SDK clients on cache eviction.** `LLMClientCache._remove_key()` must not call `close()`/`aclose()` on evicted clients — they may still be used by in-flight requests. Doing so causes `RuntimeError: Cannot send a request, as the client has been closed.` after the 1-hour TTL expires. Cleanup happens at shutdown via `close_litellm_async_clients()`.
diff --git a/litellm/proxy/CLAUDE.md b/litellm/proxy/CLAUDE.md
@@ -0,0 +1,29 @@
+# Proxy Server (`litellm/proxy/`)
+
+Rules that apply across the proxy server. Sub-domains (auth, guardrails, management endpoints, etc.) have their own nested CLAUDE.md files.
+
+## Proxy database access
+- **Do not write raw SQL** for proxy DB operations. Use Prisma model methods instead of `execute_raw` / `query_raw`.
+- Use the generated client: `prisma_client.db.<model>` (e.g. `litellm_tooltable`, `litellm_usertable`) with `.upsert()`, `.find_many()`, `.find_unique()`, `.update()`, `.update_many()` as appropriate. This avoids schema/client drift, keeps code testable with simple mocks, and matches patterns used in spend logs and other proxy code.
+- **No N+1 queries.** Never query the DB inside a loop. Batch-fetch with `{"in": ids}` and distribute in-memory.
+- **Batch writes.** Use `create_many`/`update_many`/`delete_many` instead of individual calls (these return counts only; `update_many`/`delete_many` no-op silently on missing rows). When multiple separate writes target the same table (e.g. in `batch_()`), order by primary key to avoid deadlocks.
+- **Push work to the DB.** Filter, sort, group, and aggregate in SQL, not Python. Verify Prisma generates the expected SQL — e.g. prefer `group_by` over `find_many(distinct=...)` which does client-side processing.
+- **Bound large result sets.** Prisma materializes full results in memory. For results over ~10 MB, paginate with `take`/`skip` or `cursor`/`take`, always with an explicit `order`. Prefer cursor-based pagination (`skip` is O(n)). Don't paginate naturally small result sets.
+- **Limit fetched columns on wide tables.** Use `select` to fetch only needed fields — returns a partial object, so downstream code must not access unselected fields.
+- **Check index coverage.** For new or modified queries, check `schema.prisma` for a supporting index. Prefer extending an existing index (e.g. `@@index([a])` → `@@index([a, b])`) over adding a new one, unless it's a `@@unique`. Only add indexes for large/frequent queries.
+- **Keep schema files in sync.** Apply schema changes to all `schema.prisma` copies (`schema.prisma`, `litellm/proxy/`, `litellm-proxy-extras/`, `litellm-js/spend-logs/` for SpendLogs) with a migration under `litellm-proxy-extras/litellm_proxy_extras/migrations/`.
+
+## Database Migrations
+- Prisma handles schema migrations
+- Migration files auto-generated with `prisma migrate dev`
+- Always test migrations against both PostgreSQL and SQLite
+
+## Troubleshooting: DB schema out of sync after proxy restart
+`litellm-proxy-extras` runs `prisma migrate deploy` on startup using **its own** bundled migration files, which may lag behind schema changes in the current worktree. Symptoms: `Unknown column`, `Invalid prisma invocation`, or missing data on new fields.
+
+**Diagnose:** Run `\d "TableName"` in psql and compare against `schema.prisma` — missing columns confirm the issue.
+
+**Fix options:**
+1. **Create a Prisma migration** (permanent) — run `prisma migrate dev --name <description>` in the worktree. The generated file will be picked up by `prisma migrate deploy` on next startup.
+2. **Apply manually for local dev** — `psql -d litellm -c "ALTER TABLE ... ADD COLUMN IF NOT EXISTS ..."` after each proxy start. Fine for dev, not for production.
+3. **Update litellm-proxy-extras** — if the package is installed from PyPI, its migration directory must include the new file. Either update the package or run the migration manually until the next release ships it.
diff --git a/litellm/proxy/auth/CLAUDE.md b/litellm/proxy/auth/CLAUDE.md
@@ -0,0 +1,5 @@
+# Auth (`litellm/proxy/auth/`)
+
+API key management, JWT, OAuth2, SSO, virtual keys, team/org access control.
+
+_No rules yet — add guidance here as it emerges from real issues._
diff --git a/litellm/proxy/guardrails/CLAUDE.md b/litellm/proxy/guardrails/CLAUDE.md
@@ -0,0 +1,5 @@
+# Guardrails (`litellm/proxy/guardrails/`)
+
+Safety and content-filtering hooks, guardrail registry, guardrail initializers.
+
+_No rules yet — add guidance here as it emerges from real issues._
diff --git a/litellm/proxy/hooks/CLAUDE.md b/litellm/proxy/hooks/CLAUDE.md
@@ -0,0 +1,5 @@
+# Proxy Hooks (`litellm/proxy/hooks/`)
+
+Pre-call / post-call hooks: alerting, parallel request limits, cache-control, etc.
+
+_No rules yet — add guidance here as it emerges from real issues._
diff --git a/litellm/proxy/management_endpoints/CLAUDE.md b/litellm/proxy/management_endpoints/CLAUDE.md
@@ -0,0 +1,13 @@
+# Management Endpoints (`litellm/proxy/management_endpoints/`)
+
+Admin CRUD endpoints: keys, teams, users, orgs, budgets, models, MCP servers, SCIM, SSO.
+
+## MCP OAuth / OpenAPI (backend)
+- When an MCP server already has `authorization_url` stored, skip OAuth discovery (`_discovery_metadata`) — the server URL for OpenAPI MCPs is the spec file, not the API base, and fetching it causes timeouts.
+- `client_id` should be optional in the `/authorize` endpoint — if the server has a stored `client_id` in credentials, use that. Never require callers to re-supply it.
+
+## MCP Credential Storage
+- OAuth credentials and BYOK credentials share the `litellm_mcpusercredentials` table, distinguished by a `"type"` field in the JSON payload (`"oauth2"` vs plain string).
+- When deleting OAuth credentials, check type before deleting to avoid accidentally deleting a BYOK credential for the same `(user_id, server_id)` pair.
+- Always pass the raw `expires_at` timestamp to the client — never set it to `None` for expired credentials. Let the frontend compute the "Expired" display state from the timestamp.
+- Use `RecordNotFoundError` (not bare `except Exception`) when catching "already deleted" in credential delete endpoints.
diff --git a/litellm/proxy/pass_through_endpoints/CLAUDE.md b/litellm/proxy/pass_through_endpoints/CLAUDE.md
@@ -0,0 +1,5 @@
+# Pass-Through Endpoints (`litellm/proxy/pass_through_endpoints/`)
+
+Provider-native request forwarding (not OpenAI-shaped).
+
+_No rules yet — add guidance here as it emerges from real issues._
diff --git a/litellm/proxy/spend_tracking/CLAUDE.md b/litellm/proxy/spend_tracking/CLAUDE.md
@@ -0,0 +1,5 @@
+# Spend Tracking (`litellm/proxy/spend_tracking/`)
+
+SpendLogs, daily activity aggregation, cost tracking.
+
+_No rules yet — add guidance here as it emerges from real issues._
diff --git a/ui/litellm-dashboard/CLAUDE.md b/ui/litellm-dashboard/CLAUDE.md
@@ -0,0 +1,17 @@
+# UI Dashboard (`ui/litellm-dashboard/`)
+
+Rules for the Next.js admin UI served by the proxy.
+
+## UI / Backend Consistency
+- When wiring a new UI entity type to an existing backend endpoint, verify the backend API contract (single value vs. array, required vs. optional params) and ensure the UI controls match — e.g., use a single-select dropdown when the backend accepts a single value, not a multi-select
+
+## UI Component Library
+- **Always use `antd` for new UI components** — we are migrating off of `@tremor/react`. Do not introduce new `Badge`, `Text`, `Card`, `Grid`, `Title`, or other imports from `@tremor/react` in any new or modified file. Use `antd` equivalents: `Tag` for labels, plain `<span>`/`<div>` with Tailwind classes (or `Typography.Text`) for text, `Card` from `antd`, etc. Note that `antd` has no `"yellow"` Tag color — use `"gold"` for amber/yellow.
+
+## Browser Storage Safety
+- Never write LiteLLM access tokens or API keys to `localStorage` — use `sessionStorage` only. `localStorage` survives browser close and is readable by any injected script (XSS).
+- Shared utility functions (e.g. `extractErrorMessage`) belong in `src/utils/` — never define them inline in hooks or duplicate them across files.
+
+## MCP OAuth / OpenAPI (UI-side)
+- `TRANSPORT.OPENAPI` is a UI-only concept. The backend only accepts `"http"`, `"sse"`, or `"stdio"`. Always map it to `"http"` before any API call (including pre-OAuth temp-session calls).
+- FastAPI validation errors return `detail` as an array of `{loc, msg, type}` objects. Error extractors must handle: array (map `.msg`), string, nested `{error: string}`, and fallback.