feat: add ai recommend chat#2441
Conversation
Add useQuerySuggestions hook following existing @queries/ pattern (TanStack Query) and connect ChatPopoverContainer to the backend GET /v1/aichat/suggestions endpoint. Suggestions are fetched when the popover opens and cached for 5 minutes. - Create @queries/suggestions.ts with defensive params check - Remove EMPTY_SUGGESTIONS from ChatWidget, fetch in container - Memoize suggestionsParams to avoid unnecessary re-renders
Change nginx location from prefix /v1/aichat/ to exact match /v1/aichat/chat so that /v1/aichat/suggestions falls through to the /v1/ prefix location and reaches the gRPC Gateway.
Add exact path match for /v1/aichat/chat before the /v1/ prefix route so SSE chat requests go to the dashboard cluster while /v1/aichat/suggestions routes to the gRPC Gateway.
Keep both pubSubRedisMode flag from main and AI Chat configuration flags from this branch.
Add react-markdown to render Markdown formatting (headings, lists, bold, links, code blocks) in assistant responses. User messages remain plain text.
- Add suggestion translations to en/ja locale files keyed by suggestion ID - Update SuggestionCard to use i18n translations with backend fallback - Inject i18n.language into PageContext.metadata for backend language awareness - Exclude metadata from suggestions query params to avoid unnecessary re-fetches
- Read language from PageContext.metadata instead of relying on LLM detection - Add Japanese/English language section to system prompt based on metadata - Add tests for edge cases: empty string, unsupported locale, injection attempt
Replace the OpenAI embedding-based RAG system with a token-free approach that fetches documentation from the public bucketeer-io/bucketeer-docs repository using the GitHub Trees API and scores documents locally. Key changes: - Add GitHubSearcher using Trees API (no auth required) with 24h TTL cache - Add MDX/JSX parser to strip markup and extract clean text - Add CJK-aware tokenizer with katakana-to-English translation for cross-language search (e.g. "SDKについて" matches English SDK docs) - Minimize system prompt to guardrails only, rely on RAG documents - Pre-compute lowercase fields and path segments for scoring efficiency - Add Searcher interface for swappable search implementations
- Fix goimports formatting in github_search.go (const alignment) - Fix line length >120 chars in prompt.go system prompt - Fix prettier formatting in chat-popover-container.tsx and chat-popover.tsx
- Add AI_CHAT_ENABLED to Helm env-js-configmap so the chat widget renders in production deployments - Map SSE backend error strings to known CHAT_ERROR codes instead of passing raw text as i18n keys - Fix suggestions API query param names from snake_case to camelCase to match gRPC-Gateway swagger spec (environmentId, pageContext.*)
…y default - Sanitize user-controlled fields in feature context with %q quoting and control character removal to mitigate prompt injection - Add untrusted data warning to Feature Flag Details section - Convert RAG reference URLs from GitHub blob links to published docs.bucketeer.io URLs - Comment out VITE_AI_CHAT_ENABLED in env.default so AI chat is disabled unless explicitly configured
The RAG system was replaced with GitHub Trees API + local keyword scoring, making the old embedding infrastructure dead code: - Remove CreateEmbeddings from llm.Client interface and OpenAI impl - Remove Service, CosineSimilarity, and embedded docs from rag package - Remove aichat-embedding-model server flag and Helm config - Remove bucketeer-docs.json embedded vector data - Remove createTestRAGService test helper - Regenerate llm mock
…ion for RAG search Replace the hand-maintained katakanaToEnglish dictionary and CJK tokenization logic with query-time LLM keyword extraction, enabling cross-language RAG search without manual enumeration. Also consolidate rag.go types into github_search.go, extract shared message conversion helper in openai.go, and add 5s timeout to keyword extraction.
- Unify HTTP/gRPC auth via role.CheckEnvironmentRole (remove dead getEnvironmentRole) - Move rate limit check before auth to avoid unnecessary RPC on hot path - Add io.LimitReader to RAG fetchTree to prevent OOM on large responses - Wrap feature context in XML tags to mitigate prompt injection - Add SDK info to system prompt to prevent hallucination - Embed auto-cleanup goroutine in ratelimit.NewLimiter (context-based lifecycle) - Fix RAG search extension filter to include .md files (not just .mdx) - Deduplicate error code checks with isChatErrorCode utility (frontend) - Add requestAnimationFrame batching for SSE streaming chunks - Replace magic number with LIST_PAGE_SIZE constant in flag-selector - Add precise mock expectations (Times(1)) and body assertions in tests
- Fix gofmt alignment in github_search.go constants block - Break long line in prompt.go to stay within 120-char limit - Run prettier on flag-selector.tsx and use-sse-chat.ts
| openAIAPIKey *string | ||
| openAIBaseURL *string | ||
| aichatModel *string | ||
| aichatGitHubToken *string |
There was a problem hiding this comment.
[ask] Is aichatGitHubToken never used? I could not find its usage.
There was a problem hiding this comment.
Thanks! aichatGitHubToken was defined as a server flag but was never passed to NewGitHubSearcher. I fixed it! fix: add GitHub token support and improve RAG search scoring
The token is used to set the Authorization: Bearer header on GitHub API requests, increasing the rate limit from 60 to 5,000 requests/hour.
| logLevel: info | ||
| # AI Chat configuration (optional — leave openaiApiKey empty to disable) | ||
| aichat: | ||
| openaiApiKeySecret: |
There was a problem hiding this comment.
What happens if the key is empty but the AI_CHAT_ENABLED is true?
There was a problem hiding this comment.
Thanks!
BUCKETEER_WEB_OPENAI_API_KEY— Backend gate. When empty, no AI Chat gRPC service, SSE handler, or routes are registered. This is the server-side kill switch.AI_CHAT_ENABLED— Frontend gate. Controls whether the ChatWidget is rendered in the browser. This is the client-side visibility toggle.
The backend cannot inject runtime state into the frontend directly — env.js is a static file generated at deploy time (via Helm ConfigMap or Docker Compose volume mount). This is the same pattern used by DEMO_SIGN_IN_ENABLED, which is also a deploy-time value injected into env.js via Helm values.
- Accept optional GitHub token in GitHubSearcher to increase API rate limits from 60/hr (unauthenticated) to 5,000/hr - Pass configured aichat-github-token from server to GitHubSearcher - Strip punctuation from search tokens to fix matching (e.g. "sdk?" → "sdk") - Increase path segment match weight (3→10) and use presence-only content scoring to prevent common words from outranking structural matches - Use bidirectional HasSuffix for plural handling (e.g. "sdks" → "sdk") - Skip single-char tokens instead of ≤2 chars to avoid dropping "go"
Tighten the system prompt restrictions so the LLM only states facts found in RAG reference documents. Previously the model fabricated SDK names and language support not present in the docs.
e37e09c to
ea53232
Compare
This variable was commented out and never used by Docker Compose. AI_CHAT_ENABLED is controlled via the static env.js file (Docker Compose) or Helm ConfigMap (Kubernetes), not via Vite build variables.
The aichat model default was gpt-4o-mini, but since openaiBaseUrl supports any OpenAI-compatible API, the model name should not assume a specific provider. Operators must now explicitly configure the model name alongside the API key and base URL.
Refactor all test files under pkg/aichat/ to use the project's table-driven test conventions: `patterns` slice variable, `p` loop variable, `desc` field. Consolidate individual Test_* functions into grouped table-driven tests where practical. No test logic or assertions changed.
Use errgroup.SetLimit for concurrency control instead of manual semaphore channel + sync.WaitGroup, matching the errgroup usage in chat_stream.go.
…n format - Replace context.Background()/context.TODO() with t.Context() in all aichat test files for proper test lifecycle management - Convert TestGetSuggestions and TestChat to table-driven format matching the project's existing patterns (AccountService, FeatureService) - Remove unused context imports
Replace fmt.Sprintf-based URL building with net/url.JoinPath for safer path construction in fetchTree and fetchRawDoc.
Inline limitInputLength into normalizeInput since it was just an alias with no additional logic.
# Conflicts: # pkg/api/api/api_grpc_test.go
There was a problem hiding this comment.
What about using the GitHub Search API instead (or using some library)? https://docs.github.com/en/rest/search/search?apiVersion=2026-03-10
It might be simpler.
There was a problem hiding this comment.
Good suggestion! I considered the GitHub Search API but chose the Trees API + local scoring approach for a few reasons:
The Search API has a stricter rate limit — 10 requests/min for authenticated users, compared to 5,000/hr for the REST API. Since every chat message triggers a search, this could be hit quickly with multiple concurrent users.
With local scoring, we can control how results are ranked. The Search API uses GitHub's own relevance algorithm optimized for code search, not documentation retrieval. Our local scoring weights path segments heavily (e.g., a query for "sdk" prioritizes docs under docs/sdk/), which significantly improves result quality for this use case.
After the initial index build (cached for 24h), searches are purely in-memory with no network round-trip, so there's no added latency per chat message.
That said, the Searcher interface makes it easy to swap implementations later if we find a better approach.
| // Path segment match (highest weight — structural relevance) | ||
| // Use HasSuffix for reverse direction to handle plurals (e.g. "sdks" has suffix "sdk") | ||
| for _, seg := range doc.pathSegments { | ||
| if seg == token { | ||
| score += 10.0 | ||
| } else if strings.Contains(seg, token) || strings.HasSuffix(token, seg) { | ||
| score += 5.0 | ||
| } | ||
| } |
There was a problem hiding this comment.
Thank you!
I fixed it!
fix: use HasPrefix instead of HasSuffix for plural token matching
Move system prompt and keyword extraction prompt from inline Go string constants to separate .txt files using go:embed, following the existing pattern used for SQL and Stan files in the codebase.
Remove pageTypeToString in favor of proto-generated String() method. Replace individual httpPageType constants with a single map for HTTP-to-proto page type conversion.
HasSuffix("sdks", "sdk") is false; HasPrefix("sdks", "sdk") is true.
To resolve #2150
Summary
Add an interactive AI chat assistant to the Bucketeer dashboard. Users can ask natural-language questions about feature flags, A/B testing, progressive rollouts, and other Bucketeer capabilities. Responses are streamed in real time and grounded in Bucketeer's official documentation through Retrieval-Augmented Generation (RAG).
Architecture Overview
Request Flow
sequenceDiagram participant User as User participant UI as React UI<br/>(useSSEChat) participant SSE as SSE Handler<br/>(chat_http_service) participant Auth as Auth & Rate Limit participant Stream as streamChat() participant LLM as LLM Client<br/>(OpenAI) participant RAG as RAG Searcher<br/>(GitHub API) participant Feature as Feature Service User->>UI: Send message UI->>SSE: POST /v1/aichat/chat (Bearer token, SSE) SSE->>Auth: Token validation + Role check + Rate limit Auth-->>SSE: OK SSE->>Stream: toProtoRequest() → streamChat() par Concurrent processing Stream->>LLM: extractSearchQuery()<br/>(Multilingual → English keyword extraction) LLM-->>Stream: English keywords Stream->>RAG: Search(keywords, topK=3) RAG-->>Stream: DocChunks[] and Stream->>Feature: GetFeature(featureId) Feature-->>Stream: Flag metadata (sanitized) end Stream->>Stream: buildSystemPrompt()<br/>(base + page context + RAG docs + feature data) Stream->>LLM: StreamChat(system + messages) loop SSE Streaming LLM-->>Stream: chunk Stream-->>SSE: chunk SSE-->>UI: data: {"content":"...","done":false} UI-->>User: requestAnimationFrame batch render end SSE-->>UI: data: [DONE]Component Architecture
graph TB subgraph Frontend ["Frontend (React + TypeScript)"] ChatWidget["Chat Widget<br/>(index.tsx)"] PopoverContainer["Popover Container<br/>(chat-popover-container.tsx)"] SSEHook["useSSEChat Hook<br/>(use-sse-chat.ts)"] Streamer["chatStreamer<br/>(native fetch + ReadableStream)"] SugFetcher["suggestionsFetcher<br/>(axios)"] end subgraph Backend ["Backend (Go)"] subgraph API ["API Layer"] HTTPSvc["chatHTTPService<br/>(SSE Handler)"] GRPCSvc["AIChatService<br/>(gRPC)"] ChatStream["streamChat()<br/>(shared core logic)"] Prompt["buildSystemPrompt()"] FeatureCtx["buildFeatureContext()<br/>(privacy-filtered)"] end subgraph LLMLayer ["LLM Layer"] LLMClient["Client Interface"] OpenAI["OpenAI Client<br/>(go-openai)"] end subgraph RAGLayer ["RAG Layer"] Searcher["Searcher Interface"] GitHubSearch["GitHubSearcher<br/>(Trees + Search API)"] MDXParser["MDX Parser"] TFIDFScore["TF-IDF Scoring"] end subgraph Security ["Security"] RoleCheck["role.CheckEnvironmentRole()"] RateLimiter["Token Bucket<br/>(per-user, 20req/min)"] Sanitizer["Input Sanitizer<br/>(HTML escape, control char strip)"] end end subgraph External ["External Services"] OpenAIAPI["OpenAI API<br/>(or compatible)"] GitHubAPI["GitHub API<br/>(bucketeer-docs)"] FeatureSvc["Feature Service<br/>(gRPC)"] end ChatWidget --> PopoverContainer PopoverContainer --> SSEHook SSEHook --> Streamer PopoverContainer --> SugFetcher Streamer -->|"POST /v1/aichat/chat"| HTTPSvc SugFetcher -->|"GET /v1/aichat/suggestions"| GRPCSvc HTTPSvc --> RoleCheck HTTPSvc --> RateLimiter HTTPSvc --> ChatStream GRPCSvc --> RoleCheck GRPCSvc --> ChatStream ChatStream --> Sanitizer ChatStream --> Prompt Prompt --> FeatureCtx ChatStream --> LLMClient ChatStream --> Searcher LLMClient --> OpenAI OpenAI --> OpenAIAPI Searcher --> GitHubSearch GitHubSearch --> MDXParser GitHubSearch --> TFIDFScore GitHubSearch --> GitHubAPI FeatureCtx --> FeatureSvcDesign Decisions
SSE over gRPC streaming
gRPC-Gateway does not translate server-side streaming RPCs into SSE — it buffers the entire response. Since chat requires token-by-token streaming to the browser, we implement a dedicated HTTP handler (
chatHTTPService) that writes SSE frames directly. The gRPCChatRPC still exists in proto for internal and API consumers, and both paths share the samestreamChatcore logic to avoid divergence.RAG without a vector database
We chose a lightweight RAG approach using GitHub's public APIs instead of deploying a vector database:
bucketeer-io/bucketeer-docs(cached 24h in-memory)This keeps the infrastructure footprint zero — no embedding service, no vector store, no index rebuild pipeline. The trade-off is lower recall on semantic queries, but for a documentation assistant answering "how do I..." questions, keyword matching performs well enough. We can upgrade to embeddings later without changing the
Searcherinterface.LLM-based keyword extraction for cross-language search
Japanese user queries need to be translated into English keywords to search English documentation. Rather than maintaining a hand-curated katakana→English dictionary (which was the initial approach and quickly became incomplete), we use a cheap LLM call (
temperature=0, 5-second timeout) to extract English search terms. On failure, the system falls back to the raw user input — this graceful degradation means a broken keyword extraction never blocks the chat flow.Feature context: privacy-first design
When a user is on a flag detail or targeting page, we fetch the flag's metadata from the Feature Service and inject it into the system prompt. However, we deliberately exclude sensitive data:
Only structural information is sent: flag name, description, variation names, tags, rule structure, and enabled/disabled state. This lets the LLM give contextual answers ("your flag has 3 variations...") without leaking business data into the LLM provider.
Prompt injection mitigation
Feature flag data and retrieved documents are user-influenced content injected into the system prompt. To prevent prompt injection:
<feature_data>XML delimiter tagsUnified authorization across HTTP and gRPC
Both the SSE HTTP handler and the gRPC service use
role.CheckEnvironmentRolewith the same Viewer-minimum requirement. Earlier iterations had the HTTP path implementing its own role check, which risked diverging from the gRPC path. Unifying on the shared utility ensures a single source of truth for authorization logic.Rate limiter lifecycle
The rate limiter uses a token bucket per user email. Rather than requiring callers to remember to call
Cleanup(), the limiter spawns an internal goroutine (10-minute tick) that evicts idle entries. The goroutine's lifecycle is tied to acontext.Contextpassed at construction, so it automatically stops when the server shuts down — no leaked goroutines, no manual cleanup.Frontend streaming: requestAnimationFrame batching
SSE chunks arrive faster than React can re-render. Instead of calling
setMessageson every chunk (causing layout thrashing), theuseSSEChathook accumulates chunks in a string buffer and flushes them on the next animation frame. This keeps the UI smooth at 60fps regardless of chunk frequency.Native fetch instead of axios for SSE
Axios does not support
ReadableStream— it buffers the entire response body. Since SSE requires incremental reading, the chat streamer uses nativefetchwithresponse.body.getReader(). The suggestions endpoint (non-streaming) still uses axios via the existingaxiosClientto stay consistent with the rest of the dashboard.Configuration
BUCKETEER_WEB_OPENAI_API_KEYBUCKETEER_WEB_OPENAI_BASE_URLBUCKETEER_WEB_AICHAT_MODELgpt-4o-miniBUCKETEER_WEB_AICHAT_GITHUB_TOKENAI_CHAT_ENABLEDfalsedemo.mov