Skip to content

Latest commit

 

History

History
416 lines (368 loc) · 44.3 KB

File metadata and controls

416 lines (368 loc) · 44.3 KB

Repowise Computed Glossary

This glossary describes the data Repowise computes while indexing, analyzing, generating, serving, and exporting a repository. It is based on the code paths in packages/core, packages/server, and packages/cli, not only on README files.

Use this as the vocabulary map for wiki pages, graph records, risk signals, workspace overlays, MCP responses, and CLI output.

Quick Map

Area Main code paths What gets computed
Traversal and parsing packages/core/src/repowise/core/ingestion/traverser.py, packages/core/src/repowise/core/ingestion/parser.py, packages/core/src/repowise/core/ingestion/models.py Files, languages, entry points, symbols, imports, exports, calls, inheritance, parse errors, content hashes
Graph construction packages/core/src/repowise/core/ingestion/graph.py, call_resolver.py, heritage_resolver.py, framework_edges.py, dynamic_hints/ File and symbol nodes, import/call/heritage/framework/dynamic/co-change edges, centrality, SCCs, communities, execution flows
Git intelligence packages/core/src/repowise/core/ingestion/git_indexer.py Churn, ownership, hotspots, bus factor, co-change partners, significant commits, temporal scores, rename and merge signals
Analysis packages/core/src/repowise/core/analysis/ Dead-code findings, decision records, decision staleness, security findings, PR blast radius, execution flows, communities
Generation packages/core/src/repowise/core/generation/ Wiki page contexts, page types, source hashes, summaries, freshness, confidence decay, RAG context, job checkpoints, reports, costs
Workspace intelligence packages/core/src/repowise/core/workspace/ Workspace repo scan, cross-repo co-changes, package dependencies, API contracts, contract links, workspace CLAUDE.md data
Persistence and search packages/core/src/repowise/core/persistence/, Alembic migrations ORM rows, FTS rows, vector records, answer cache, cost rows, graph rows
API, MCP, CLI packages/server/src/repowise/server/, packages/cli/src/repowise/cli/ Dashboard schemas, MCP tool payloads, status tables, doctor checks, exports, costs, augment hook context

Traversal And Repository Structure

Term Definition Computed by Example
Includable source file A file that survives ignore rules, blocked patterns, size limit, binary detection, generated-file detection, and language detection. FileTraverser._build_file_info() packages/core/src/repowise/core/ingestion/parser.py
FileInfo Per-file metadata used by the parser and graph builder. FileTraverser.traverse() {path: "src/app.py", language: "python", is_test: false, is_entry_point: true}
Language tag Canonical language value from file extension, special filename, or shebang. ingestion/models.py, traverser.py, languages/registry.py python, typescript, go, terraform, openapi, unknown
Test file flag Whether a file looks like a test/spec/fixture file. FileTraverser._build_file_info() and community/test-gap helpers tests/test_auth.py -> is_test=true
Config file flag Whether a file is classified as configuration. FileTraverser._build_file_info() pyproject.toml -> is_config=true
API contract flag Whether a file is an API contract format. FileTraverser._build_file_info() openapi.yaml -> is_api_contract=true
Entry point flag Whether a filename or language-specific entry pattern marks a file as a starting point. FileTraverser._build_file_info() main.py, server.ts, Dockerfile depending on rules
Traversal stats Counts of included files and skip reasons. TraversalStats in traverser.py {included: 240, skipped_binary: 3, skipped_generated: 12}
Package info A package/workspace detected from manifests near the repo root. FileTraverser._detect_monorepo() {name: "core", path: "packages/core", manifest_file: "pyproject.toml"}
Repo structure High-level structure summary used by overview generation. FileTraverser.get_repo_structure() {is_monorepo: true, total_files: 820, entry_points: ["packages/cli/src/.../main.py"]}
Language distribution Fraction of included files by language. get_repo_structure() {"python": 0.72, "typescript": 0.18, "markdown": 0.10}
Estimated LOC Fast line-count estimate from file sizes, not exact source line counting. get_repo_structure() total_loc = sum(size_bytes // 40)
Content hash SHA-256 of raw file bytes. compute_content_hash() in ingestion/models.py 3f786850e387550fdab836ed7e6dc881de23001b...

Parsing, Symbols, Imports, Calls

Term Definition Computed by Example
ParsedFile Full parse result for one file: file metadata, symbols, imports, exports, calls, heritage, docstring, parse errors, content hash. ASTParser.parse_file() ParsedFile(symbols=[...], imports=[...], calls=[...])
Symbol A function, class, method, interface, enum, constant, type alias, module, macro, variable, etc. ASTParser._extract_symbols() src/app.py::create_app
Symbol ID Stable ID derived from path and name, including parent class for methods. ASTParser._extract_symbols() src/models.py::User::save
Qualified name Dot-form symbol name derived from path and parent. _build_qualified_name() src.models.User.save
Symbol kind Canonical symbol type. LanguageConfig.symbol_node_types plus refiners function, class, method, interface, struct, trait
Signature Compact declaration text. build_signature() via parser extractors def create_app(config: Config) -> FastAPI
Symbol docstring Human text attached to a symbol, when extractable. extract_symbol_docstring() "Create and configure the API app."
Module docstring File-level docstring. extract_module_docstring() "Command-line entry points."
Visibility Public/private/protected/internal classification. Language-specific visibility helpers _helper -> private, UserService -> public
Async flag Whether a symbol is async. _is_async_node() async def fetch() -> is_async=true
Complexity estimate Symbol complexity field, persisted to symbols. Parser/model pipeline; defaults to 1 unless language extraction enriches it complexity_estimate: 3
Decorators Decorator/modifier strings captured with a symbol. ASTParser._extract_symbols() ["@router.get('/users')"]
Import Raw import statement plus normalized module path and imported names. ASTParser._extract_imports() {raw_statement: "from .db import Session", module_path: ".db", imported_names: ["Session"]}
Named binding Alias-aware import binding. extract_import_bindings() {local_name: "np", exported_name: null, is_module_alias: true}
Resolved import Import whose module path was matched to a repo file. GraphBuilder.build() through resolve_import() from .models import User -> src/models.py
Export list Public top-level symbol names exported by a file. ASTParser._derive_exports() ["create_app", "Settings"]
Call site Raw function or method call extracted from the AST. ASTParser._extract_calls() {target_name: "save", receiver_name: "user", line: 42, argument_count: 1}
Enclosing caller symbol The symbol that contains a call site. _find_enclosing_symbol() src/app.py::main
Heritage relation Raw inheritance or implementation relationship. extract_heritage() OrderController extends BaseController
Parse error Non-fatal syntax/tree-sitter error description. _collect_error_nodes() Parse error at line 17

Graph Entities And Edges

Term Definition Computed by Example
Dependency graph Directed NetworkX graph containing file nodes, symbol nodes, and edge metadata. GraphBuilder nx.DiGraph with nodes src/app.py, src/app.py::main
File node Graph node for a source file. GraphBuilder.add_file() {node_type: "file", language: "python", symbol_count: 8}
Symbol node Graph node for an extracted symbol. GraphBuilder.add_file() {node_type: "symbol", kind: "function", name: "main"}
External node Node for third-party or unresolvable dependencies. Import resolution paths external:react
Synthetic module symbol Symbol node for top-level calls in a file. GraphBuilder.add_file() src/app.py::__module__
defines edge File-to-symbol containment. GraphBuilder.add_file() src/app.py -> src/app.py::main
imports edge File-to-file import relationship. GraphBuilder.build() src/app.py -> src/settings.py
imported_names edge payload Names imported along an import edge. GraphBuilder.build() ["Settings", "load_config"]
has_method edge Class-to-method containment. GraphBuilder.add_file() src/models.py::User -> src/models.py::User::save
calls edge Symbol-to-symbol call relationship. CallResolver, then GraphBuilder._resolve_calls() src/app.py::main -> src/db.py::connect
Call confidence Confidence that a call edge points to the right callee. CallResolver 0.95 same-file, 0.90 import binding, 0.50 global unique
extends edge Class/struct inheritance edge. HeritageResolver UserView -> BaseView
implements edge Interface/trait implementation edge. HeritageResolver UserRepository -> Repository
Heritage confidence Confidence that inheritance/implementation resolved correctly. HeritageResolver 0.95 same-file, 0.90 imported, 0.50 global unique
framework edge Synthetic edge from framework conventions. framework_edges.py urls.py -> views.py, app.py -> routers/users.py
Dynamic edge Edge inferred from runtime/dynamic patterns. dynamic_hints/* and GraphBuilder.add_dynamic_edges() {edge_type: "dynamic_imports", hint_source: "django", weight: 1.0}
co_changes edge File-to-file historical coupling edge. GraphBuilder.add_co_change_edges() from git metadata src/a.py -> src/b.py with weight: 4.2
Stem map Import-stem to candidate file path lookup used for import resolution. GraphBuilder._build_stem_map() {"models": ["src/models.py", "tests/models.py"]}
File subgraph File-only graph used for PageRank and betweenness. GraphBuilder.file_subgraph() All file/external nodes, excluding co_changes edges
PageRank File centrality in the import graph. GraphBuilder.pagerank() 0.01842
Betweenness How often a file sits on shortest paths. GraphBuilder.betweenness_centrality() 0.0067
SCC Strongly connected component, used to detect dependency cycles. GraphBuilder.strongly_connected_components() {"src/a.py", "src/b.py"}
SCC page group Non-singleton SCC that gets a cycle page. PageGenerator.generate_all() scc-3
Graph JSON Node-link serialization of the graph. GraphBuilder.to_json() {"directed": true, "nodes": [...], "links": [...]}

Communities And Execution Flows

Term Definition Computed by Example
File community Cluster of related production files, with tests assigned to their most-related production community. detect_file_communities() community_id: 2
Symbol community Cluster of symbol nodes based on call and heritage edges. detect_symbol_communities() symbol_community_id: 5
Community algorithm Partition algorithm used. communities._partition() leiden, louvain, none, failed
Oversized community split Second partition pass for communities larger than a graph fraction. _split_oversized() A 300-file cluster split into smaller clusters
Community label Human label derived from non-generic path segments or filename keywords. _heuristic_label() api/routes, auth, payments
Community cohesion Ratio of actual intra-community edges to possible edges. _cohesion_score() 0.2143
Dominant language Most common language among community members. _dominant_language() python
Neighboring community Adjacent community from graph edges, surfaced by MCP/API. tool_community.py, graph routers {community_id: 4, edge_count: 9}
Entry point score 0 to 1 score for a function/method as an execution start. _score_entry_point() 0.735 for main()
Entry point score signals Weighted fan-out, low in-degree, visibility, name pattern, and file entry flag. _score_entry_point() public main() with many calls scores high
Execution flow BFS trace following high-confidence call edges from an entry point. trace_execution_flows() main -> load_config -> connect_db
Cross-community flow Execution flow that visits more than one community. _bfs_trace() communities_visited: [0, 3]
Flow depth Number of call hops in a traced flow. _bfs_trace() depth: 4
Flow deduplication Keeps the longest flow per shared first-three-node prefix. _deduplicate_flows() Two main -> route -> handler traces collapse to one

Git Intelligence

Term Definition Computed by Example
Git metadata row Per-file history, ownership, churn, and coupling record. GitIndexer.index_repo() and _index_file() One git_metadata row for src/app.py
Commit counts Total, 90-day, and 30-day commit volumes. _index_file() {commit_count_total: 87, commit_count_90d: 12, commit_count_30d: 3}
Commit count capped Whether the history reached the configured commit limit. _index_file() true when len(commits) >= 500
First/last commit timestamps Oldest and newest commit timestamps for a file. _index_file() first_commit_at: 2024-05-03T10:00:00Z
File age days Days since first commit. _index_file() age_days: 455
Primary owner Dominant owner by blame when available, otherwise by commit count. _get_blame_ownership() and _index_file() {name: "Asha", email: "asha@example.com", pct: 0.64}
Top authors Top five authors by commit count. _index_file() [{name: "Asha", commit_count: 20}]
Recent owner Dominant committer in the last 90 days. _index_file() recent_owner_name: "Sam"
Contributor count Number of distinct authors. _index_file() contributor_count: 6
Bus factor Number of contributors needed to account for 80 percent of commits. _index_file() bus_factor: 2
Significant commits Filtered, non-noise commit messages useful for decisions and risk. _is_significant_commit() [{sha: "a1b2c3d4", message: "migrate auth to JWT"}]
PR number PR/MR number extracted from significant commit messages. _PR_NUMBER_RE in git_indexer.py pr_number: 128
Commit categories Message classification counts. _COMMIT_CATEGORIES in git_indexer.py {"feature": 4, "fix": 11, "refactor": 2}
Lines added/deleted 90d Recent churn by numstat. _index_file() {lines_added_90d: 340, lines_deleted_90d: 87}
Average commit size (lines_added_90d + lines_deleted_90d) / commit_count_90d. _index_file() 35.6
Merge commit count 90d Number of merge commits touching the file recently. _index_file() merge_commit_count_90d: 2
Original path Earliest path found through rename-follow history. _detect_original_path() legacy/auth/session.py
Temporal hotspot score Exponentially decayed churn score with 180-day half-life. _index_file() 2.43
Churn percentile Rank percentile among indexed files by temporal hotspot score, with 90-day commits as tiebreak. _compute_percentiles() 0.88
Hotspot flag Top churn file: percentile >= 0.75 and has recent commits. _compute_percentiles() is_hotspot: true
Stable file flag File with more than 10 total commits and no recent 90-day commits. _index_file() is_stable: true
Co-change partner File historically changed in the same commits, with temporal decay. _compute_co_changes() {file_path: "src/schema.py", co_change_count: 3.72, last_co_change: "2026-04-14"}
Git index summary Repo-level indexing result. GitIndexSummary {files_indexed: 420, hotspots: 38, stable_files: 71, duration_seconds: 12.4}

Generated Wiki Pages

Term Definition Computed by Example
Page type Kind of generated documentation page. PageType in generation/models.py file_page, module_page, repo_overview
Generation level Ordered generation tier for page dependencies. GENERATION_LEVELS api_contract: 0, file_page: 2, repo_overview: 6
Generated page Markdown wiki page plus metadata and token counts. GeneratedPage and PageGenerator._build_generated_page() {page_id: "file_page:src/app.py", title: "File: src/app.py"}
Page ID Deterministic natural key. compute_page_id() symbol_spotlight:src/app.py::create_app
Source hash SHA-256 of rendered prompt/source context for freshness comparisons. compute_source_hash() 64-character hex
Page summary Deterministic first prose paragraph or overview excerpt. PageGenerator._extract_summary() "This file wires the CLI command group and registers subcommands."
Freshness status Whether a page still matches current source and age thresholds. compute_freshness() fresh, stale, expired
Confidence decay Linear decay from 1.0 to 0.0 over expiry days. decay_confidence() 0.77 after part of the expiry window
Git-adjusted confidence decay Multiplier adjusted by hotspot/stable state and commit message intent. compute_confidence_decay_with_git() Direct refactor on hotspot decays faster
Prompt cache key SHA-256 of model, language, page type, and prompt. PageGenerator._compute_cache_key() 9e107d9d372bb6826bd81d3542a419d6...
Cached tokens Tokens served from provider cache. Provider response, persisted on pages and report cached_tokens: 12000
Hallucination warning LLM output mentions symbol-like backticks not found in parsed symbols. _validate_symbol_references() Unknown symbol: "run_worker"
Generation report Run summary by page type, tokens, stale pages, dead-code count, decision count, warnings, elapsed time. GenerationReport.from_pages() {pages_by_type: {"file_page": 45}, total_input_tokens: 980000}
Estimated generation cost Token estimate using USD per 1M-token rates. GenerationReport.estimated_cost_usd() and CLI cost_estimator.py $2.3400
Generation job checkpoint JSON state for resumable generation. JobSystem {status: "running", completed_pages: 12, current_level: 2}
Generation status Job lifecycle state. JobSystem and GenerationJob ORM pending, running, completed, failed, paused

Page Contexts

Term Definition Computed by Example
File page context Template data for one important source file. ContextAssembler.assemble_file_page() {file_path, symbols, imports, dependencies, pagerank_score}
Symbol spotlight context Template data for a top public symbol. assemble_symbol_spotlight() create_app with signature, source body, callers
Module page context Aggregate context for top-level directory/module. assemble_module_page() {module_path: "packages/core", total_symbols: 780}
SCC page context Context for a circular dependency cycle. assemble_scc_page() cycle_description: "Circular dependency cycle: a.py -> b.py"
Repo overview context Whole-repo summary context. assemble_repo_overview() language_distribution, top_files_by_pagerank, circular_dependency_count
Architecture diagram context Top PageRank nodes, selected edges, communities, SCC groups. assemble_architecture_diagram() Mermaid graph inputs for 50 nodes and 200 edges
API contract context Raw API contract plus endpoint/schema hints. assemble_api_contract() endpoints: ["GET /users"], schemas: ["User"]
Infra page context Raw infra file plus target names. assemble_infra_page() Dockerfile, Makefile, terraform files
Diff summary context Changed files, symbol diffs, affected pages, trigger commit/diff. assemble_diff_summary() {added_files: ["src/new.py"], affected_page_ids: [...]}
Cross-package context Monorepo boundary summary between packages. assemble_cross_package() {source_package: "cli", target_package: "core", coupling_strength: 5}
Dependency summaries Summaries of already-generated dependency pages. assemble_file_page() with page_summaries { "src/db.py": "Database access layer..." }
RAG context Snippets from vector search for related generated pages. _generate_file_page_from_ctx() ["[file_page:src/schema.py]\nDefines API schema..."]
Token estimate len(text) // 4 heuristic. ContextAssembler._estimate_tokens() 3200
Structural summary mode Large-file outline instead of raw source snippet. _build_structural_summary() [Large file - structural summary mode]
Significant file File selected for its own file_page. _is_significant_file() Entry point, top PageRank, bridge file, package __init__.py, or test with symbols
Top symbol selection Public symbols selected by their file PageRank and percentile budget. PageGenerator.generate_all() Top 10 percent of public symbols, capped by page budget
Page budget Hard cap max(50, int(num_files * max_pages_pct)). PageGenerator.generate_all() 800 files with 10 percent cap -> 80-page budget

Dead Code

Term Definition Computed by Example
Dead-code finding A graph/git finding persisted to dead_code_findings. DeadCodeAnalyzer {kind: "unused_export", file_path: "src/api.py", confidence: 0.7}
Unreachable file File with no incoming imports, not an entry point/test/config/contract/whitelisted file. _detect_unreachable_files() src/legacy_adapter.py
Unused export Public symbol in an imported file that no importer names. _detect_unused_exports() symbol_name: "OldClient"
Unused internal Private/internal symbol with no incoming calls edges. _detect_unused_internals() _parse_legacy_token
Zombie package Monorepo top-level package with no external package importers. _detect_zombie_packages() packages/old-sdk
Dead-code confidence Heuristic certainty based on age, recent commits, importers, dynamic imports, and deprecation hints. DeadCodeAnalyzer 1.0 for year-old unreachable file
Safe-to-delete flag Whether confidence passes delete threshold and dynamic patterns do not block deletion. _make_unreachable_finding() and other passes safe_to_delete: true
Dead-code evidence Human-readable reasons for the finding. DeadCodeAnalyzer ["in_degree=0 (no files import this)", "No commits in last 90 days"]
Estimated deletable lines Sum of line estimates for safe findings. DeadCodeAnalyzer.analyze() deletable_lines: 420
Confidence summary Counts of high, medium, low confidence findings. DeadCodeAnalyzer.analyze() {"high": 12, "medium": 8, "low": 0}
Finding status User triage status persisted in DB. DeadCodeFinding.status open, acknowledged, resolved, false_positive

Decisions And Governance

Term Definition Computed by Example
Decision record ADR-like row from code comments, git, docs, or CLI/manual entry. DecisionExtractor, CRUD, CLI {title: "Use Redis for sessions", status: "active"}
Inline marker decision Decision extracted from comments such as WHY:, DECISION:, TRADEOFF:, ADR:. scan_inline_markers() # DECISION: cache auth sessions in Redis
Git archaeology decision LLM-structured decision inferred from significant commit messages with decision keywords. mine_git_archaeology() migrate from REST client to generated OpenAPI client
README-mined decision Decision extracted from docs such as README, CLAUDE, ARCHITECTURE, DESIGN. mine_readme_docs() "We use SQLite by default because setup should be local-first."
Decision source Provenance of a record. DecisionRecord.source inline_marker, git_archaeology, readme_mining, cli
Decision confidence Source-specific extraction confidence. DecisionExtractor 0.95 inline LLM, 0.70 git signal, 0.60 README mining, 1.0 manual
Affected files Files linked to a decision from graph neighbors, commit files, or manual input. DecisionExtractor ["src/auth.py", "src/session.py"]
Affected modules Top-level modules inferred from affected files or text. _infer_modules() ["src", "packages"]
Decision tags Topic labels inferred from keywords or LLM output. _infer_tags() and prompts auth, database, api, security, testing
Decision status Lifecycle state. DecisionRecord.status proposed, active, deprecated, superseded
Decision staleness score 0 to 1 score indicating code has moved since a decision. DecisionExtractor.compute_staleness() and crud.recompute_decision_staleness() 0.63
Conflict boost Staleness increase when newer commit messages contain contradiction signals and overlap decision text. compute_staleness() +0.3 for "migrate away" touching the same concept
Decision health summary Counts and lists for stale, proposed, and ungoverned hotspots. get_decision_health_summary() and server/CLI routes {active: 10, stale: 2, proposed: 3}
Ungoverned hotspot Hot file without related architectural decision coverage. Decision health computation src/payments/processor.py

Security Findings

Term Definition Computed by Example
Security finding Regex or symbol-name signal persisted to security_findings. SecurityScanner.scan_file() {kind: "hardcoded_secret", severity: "high", line: 12}
High severity finding Dangerous execution, deserialization, shell, or hardcoded secret/password pattern. _PATTERNS in security_scan.py eval_call, pickle_loads, hardcoded_password
Medium severity finding SQL construction or TLS verification issue. _PATTERNS fstring_sql, concat_sql, tls_verify_false
Low severity finding Weak hash or security-sensitive symbol name. _PATTERNS and symbol scan weak_hash, security_sensitive_symbol
Security snippet Trimmed source line or symbol name for context. SecurityScanner.scan_file() password = "admin"

Risk And Blast Radius

Term Definition Computed by Example
File risk score Pagerank centrality multiplied by 1 + temporal_hotspot_score. PRBlastRadiusAnalyzer._score_file() 0.018 * (1 + 2.4) = 0.0612
Overall PR risk score 0 to 10 composite using average direct risk, max direct risk, and transitive breadth. _compute_overall_risk() 7.25
Transitive affected file Importer reached by reverse BFS from changed files. _transitive_affected() {path: "src/api.py", depth: 2}
Co-change warning Historical co-change partner missing from a PR/change set. _cochange_warnings() {changed: "src/a.py", missing_partner: "src/b.py", score: 4.2}
Recommended reviewer Owner aggregate over changed and affected files. _recommend_reviewers() {email: "asha@example.com", files: 7, ownership_pct: 0.63}
Test gap File lacking a matching test path by basename conventions. _find_test_gaps() and MCP _check_test_gap() src/auth.py -> true
Risk trend Velocity from 30-day vs prior 60-day commit rates. tool_risk._compute_trend() increasing, stable, decreasing
Risk type Human bucket for the kind of risk. tool_risk._classify_risk_type() bug-prone, churn-heavy, bus-factor-risk, high-coupling, stable
Change pattern Human label from dominant commit category. tool_risk._derive_change_pattern() feature-active, fix-heavy, dependency-churn, mixed-activity
Impact surface Top critical reverse dependencies within two hops. tool_risk._compute_impact_surface() [{file_path: "src/api.py", pagerank: 0.05}]
Risk summary One-line synthesized risk sentence for MCP. tool_risk._assess_one_target() src/auth.py - hotspot score 88% (increasing), 6 dependents...
Top hotspots Highest churn/hotspot files returned for context. get_risk() [{file_path: "src/db.py", hotspot_score: 0.94}]

Search, Answer Cache, And Retrieval

Term Definition Computed by Example
Search result Unified full-text or vector result. SearchResult in persistence/search.py {page_id, title, page_type, target_path, score, snippet, search_type}
FTS5 query Stop-word-stripped OR prefix query for SQLite. _build_fts5_query() "auth"* OR "session"*
FTS score Positive score from negated SQLite rank or Postgres ts_rank. FullTextSearch 0.734
Vector score Cosine similarity between query embedding and page embedding. InMemoryVectorStore.search() and other vector stores 0.812
Snippet First 200 chars of indexed content. _snippet() or vector metadata "This module handles..."
Answer cache row Cached MCP answer payload. tool_answer.py and AnswerCache ORM {question_hash, payload_json, provider_name, model_name}
Question hash SHA-256 of normalized question text. tool_answer._hash_question() Same hash for "How auth works?" with extra whitespace/case
Answer payload Cached get_answer result. get_answer() {answer, citations, confidence, fallback_targets, retrieval}
Retrieval hit Search hit hydrated with page metadata and summary. tool_answer.py retrieval pipeline {target_path: "src/auth.py", score: 3.2, summary: "..."}
Retrieval dominance Gating logic comparing top and second search scores. tool_answer.py Top score high enough to answer from dominant hit
Federated RRF score Reciprocal rank fusion score for workspace search across repos. tool_search.py rrf_score: 0.0164
Confidence score Normalized workspace search confidence. tool_search.py confidence_score: 0.87

Persistence Tables And Stored Entities

Table or store Computed content Example
repositories Repo identity plus current indexed head_commit and settings JSON. {name: "repowise", default_branch: "main"}
generation_jobs Long-running generation progress. {status: "running", total_pages: 120, completed_pages: 31}
wiki_pages Current generated markdown pages and freshness metadata. file_page:src/app.py
wiki_page_versions Archived historical snapshots on regeneration. version: 3
graph_nodes File and symbol nodes with graph metrics and community metadata. {node_id: "src/app.py", pagerank: 0.02}
graph_edges Typed relationships with imported names and confidence. {source: "src/app.py", target: "src/db.py", edge_type: "imports"}
wiki_symbols Parsed symbols projected into DB. {symbol_id: "src/app.py::main", kind: "function"}
git_metadata Per-file history, churn, ownership, hotspots, co-changes. {file_path: "src/app.py", is_hotspot: true}
decision_records Extracted/manual architectural decisions and staleness. {title: "Use Postgres for production", status: "active"}
dead_code_findings Dead-code analyzer findings and triage status. {kind: "unreachable_file", safe_to_delete: true}
security_findings Static security signals. {kind: "eval_call", severity: "high"}
llm_costs Per-call token and USD cost rows. {operation: "doc_generation", input_tokens: 2500, cost_usd: 0.012}
answer_cache Cached MCP answer payloads keyed by normalized question. {question: "How does auth work?", question_hash: "..."}
conversations and chat_messages Chat state and structured message JSON. {role: "assistant", content_json: {...}}
webhook_events Received external events and processing status. {provider: "github", event_type: "push", processed: false}
SQLite page_fts FTS5 mirror of page title/content. Used by full-text search
Postgres wiki_pages.embedding pgvector embedding column, conditionally added by migration. 1536-dim vector
LanceDB wiki_pages table Local vector index with page metadata. {page_id, vector, title, page_type, target_path}

LLM Cost And Provider Usage

Term Definition Computed by Example
Pricing table USD per million input/output tokens by model. generation/cost_tracker.py claude-sonnet-4-6: {input: 3.0, output: 15.0}
Fallback pricing Default pricing for unknown models. _get_pricing() {input: 3.0, output: 15.0}
Call cost (input_tokens * input_rate + output_tokens * output_rate) / 1_000_000. CostTracker.record() 1000 in, 500 out on Sonnet -> $0.0105
Session cost Cumulative USD for one tracker instance. CostTracker.session_cost 2.37
Session tokens Cumulative input plus output tokens. CostTracker.session_tokens 845000
Cost totals DB aggregate grouped by operation, model, or day. CostTracker.totals() {group: "file_page", calls: 42, cost_usd: 1.12}
CLI cost estimate Pre-generation token/cost plan. packages/cli/src/repowise/cli/cost_estimator.py {estimated_pages: 82, estimated_cost_usd: 4.60}

Workspace Intelligence

Term Definition Computed by Example
Discovered repo Candidate git repo found under a workspace root. workspace/scanner.py {alias: "api", path: "services/api"}
Workspace config Parsed .repowise-workspace.yaml. workspace/config.py {repos: [{alias: "web", path: "apps/web"}]}
Repo update result Per-repo update outcome for workspace update/watch. workspace/update.py {alias: "core", updated: true, file_count: 420, symbol_count: 2100}
Cross-repo co-change File pair in different repos changed by same author within a time window, weighted by recency. detect_cross_repo_co_changes() {source_repo: "api", source_file: "routes/users.py", target_repo: "web", target_file: "users.tsx", strength: 1.34}
Cross-repo package dependency Manifest path dependency from one repo to another. detect_package_dependencies() {source_repo: "web", target_repo: "shared", kind: "npm_workspace"}
Cross-repo overlay JSON payload saved under workspace data dir. run_cross_repo_analysis() {co_changes: [...], package_deps: [...], repo_summaries: {...}}
Cross-repo edge count Per-repo count of co-change and package-dependency edges. _build_repo_summaries() {cross_repo_edge_count: 12}
Workspace CLAUDE.md data Per-repo summaries plus cross-repo overlays and contract links. generation/editor_files/data.py, claude_md.py {repos: [...], co_changes: [...], contract_links: [...]}

API Contracts

Term Definition Computed by Example
Contract Provider or consumer API endpoint/topic/service extracted from source. workspace/contracts.py and extractors {contract_id: "http::GET::/api/users/{param}", role: "provider"}
Contract type API surface kind. Contract extractors http, grpc, topic
Contract role Whether source provides or consumes the contract. Extractors provider, consumer
Contract confidence Extraction strategy confidence. Extractors and contract matching 0.8
Service boundary Monorepo service path assigned to contracts. workspace/extractors/service_boundary.py services/billing
Normalized contract ID Lowercase/canonical ID used for matching. normalize_contract_id() http::GET::/Api/Users/ -> http::GET::/api/users
Contract link Matched provider-consumer pair across repos/services. match_contracts() {provider_repo: "api", consumer_repo: "web", match_type: "exact"}
Manual contract link Workspace-configured provider/consumer link. _build_manual_links() {match_type: "manual", confidence: 1.0}
Contract store JSON payload saved as contracts.json. run_contract_extraction() {contracts: [...], contract_links: [...]}

Knowledge Map

Term Definition Computed by Example
Top owner Owner ranked by number of files primarily owned. server/services/knowledge_map.py {email: "asha@example.com", files_owned: 42, percentage: 18.6}
Knowledge silo File where one owner has more than 80 percent ownership. compute_knowledge_map() {file_path: "src/auth.py", owner_pct: 0.91}
Onboarding target High-PageRank file with few or no documentation words. compute_knowledge_map() {path: "src/core.py", pagerank: 0.04, doc_words: 0}
Documentation word count Word count of the generated file page content. compute_knowledge_map() doc_words: 640

CLI-Visible Computed Outputs

Command Computed output Example
repowise status Sync state, current HEAD, indexed commit, DB page counts, graph node counts, pages by type, token totals. file_page: 52, Status: 3 new commit(s)
repowise status --workspace Per-repo file/symbol counts, indexed age, HEAD short SHA, stale/up-to-date state. api 420 files 2,100 symbols 2h ago a1b2c3d stale
repowise doctor Health checks for DB, pages, vector store, FTS, graph, stale pages, store drift, coordinator state. SQL <-> Vector Store: 3 missing
repowise search Full-text/vector/wiki or symbol hits. score 0.83, file_page, src/auth.py
repowise dead-code Dead-code table or JSON report. unused_export src/api.py OldClient 0.70
repowise decision Decision list, detail view, health summary, stale records, proposed records, ungoverned hotspots. Stale decisions: 2
repowise costs Grouped LLM cost totals. group=file_page, calls=45, cost=$1.37
repowise export Markdown/HTML/JSON export entries, optionally decisions/dead-code/hotspots. wiki_pages.json with page metadata
repowise update File diffs, adaptive cascade budget, affected page plan, regenerated/decayed page counts, dead-code/decision refresh results. Adaptive cascade budget: 30
repowise reindex Embedding/indexing progress and page counts. Indexed 430 items -> .repowise/lancedb
repowise watch Debounced changed-path batches and forwarded update output. Detected 3 changed file(s), updating...
repowise workspace Workspace repo discovery, config entries, update status, cross-repo hook output. Found 2 new repo(s)
repowise generate-claude-md Editor-file data and rendered .claude/CLAUDE.md. hotspots, key_modules, decisions in markdown
repowise augment Hook-time graph/search enrichment for AI tool calls. Related files, symbols, importers, dependencies
repowise mcp FastMCP server exposing the computed graph/wiki/risk tools below. stdio or SSE transport

MCP And API-Visible Computed Payloads

Tool or endpoint concept Definition Example
get_answer RAG answer with citations, confidence, fallback targets, retrieval metadata, and answer-cache support. {answer: "...", confidence: "medium", citations: [...]}
search_codebase Wiki search using vector/FTS and federated workspace RRF when requested. {results: [{title, relevance_score, confidence_score}]}
get_context Compact page, symbol, freshness, dependency, git, and cross-repo context for targets. {targets: {"src/app.py": {docs, graph, freshness}}}
get_overview Repo or workspace overview, module map, entry points, git health, communities, and workspace footer. {summary, modules, git_health, community_summary}
get_why Decision/governance lookup, file origin story, alignment, and decision health modes. {decisions: [...], target_context: {...}}
get_risk Per-file risk, trend, risk type, owners, co-change partners, test gaps, security signals, top hotspots, optional PR blast radius. {results: [{risk_summary, hotspot_score}], top_hotspots: [...]}
get_dead_code Tiered, grouped, and summarized dead-code findings. {summary: {total_findings: 12}, tiers: {...}}
get_dependency_path Dependency-path or bridge context between files/symbols. {path: ["src/a.py", "src/b.py"]}
get_architecture_diagram Mermaid architecture diagram text. {mermaid_syntax: "graph TD\n..."}
update_decision_records Decision create/update/list/delete payloads. {status: "ok", decision: {...}}
get_symbol Exact symbol metadata and source slice. {name: "create_app", signature: "def create_app(...)"}
get_callers_callees Caller/callee neighborhood for a symbol. {callers: [...], callees: [...]}
get_graph_metrics Centrality percentiles, community, entry-point score, and graph metrics for a node. {pagerank_percentile: 92, community_label: "api"}
get_community Community details, cohesion, members, and neighboring communities. {label: "auth", cohesion: 0.21, members: [...]}
get_execution_flows Entry-point traces through call edges. {flows: [{entry_point, trace, crosses_community}]}
annotate_file Persistent human notes on a wiki page. {status: "ok", human_notes: "Watch migration path."}
Blast radius API Direct risks, transitive affected files, co-change warnings, reviewers, test gaps, overall score. {overall_risk_score: 7.25}
Knowledge map API Top owners, knowledge silos, onboarding targets. {top_owners: [...], knowledge_silos: [...]}
Cost summary API Grouped costs and totals. {groups: [...], total_cost_usd: 3.21}
Provider API Available provider/model configuration. {providers: [...], active_provider: "gemini"}

Statuses And Enumerations

Domain Values
Page freshness fresh, stale, expired, unknown in type definitions
Job status pending, running, completed, failed, paused
Decision status proposed, active, deprecated, superseded
Decision source git_archaeology, inline_marker, readme_mining, cli
Dead-code kind unreachable_file, unused_export, unused_internal, zombie_package
Dead-code status open, acknowledged, resolved, false_positive
Security severity high, med, low
Security kind eval_call, exec_call, pickle_loads, subprocess_shell_true, os_system, hardcoded_password, hardcoded_secret, fstring_sql, concat_sql, tls_verify_false, weak_hash, security_sensitive_symbol
Edge type imports, defines, calls, has_method, has_property, extends, implements, method_overrides, method_implements, co_changes, framework, dynamic, plus dynamic subtypes such as dynamic_uses, dynamic_imports, dynamic_url_route
Node type file, symbol, external
Search type vector, fulltext
Contract type http, grpc, topic
Contract role provider, consumer
Contract link match type exact, manual
Risk trend increasing, stable, decreasing, unknown
Risk type bug-prone, churn-heavy, bus-factor-risk, high-coupling, stable, unknown
Change pattern feature-active, primarily refactored, fix-heavy, dependency-churn, mixed-activity, uncategorized
Chat role user, assistant
Coordinator health ok, warning, critical

Example End-To-End Computation

For a file src/auth/session.py, a typical Repowise index can compute:

  1. FileInfo: language="python", is_test=false, is_entry_point=false.
  2. ParsedFile: symbols such as src/auth/session.py::SessionStore, imports such as from .redis import client, calls such as client.get().
  3. Graph records: a file node, symbol nodes, defines, imports, calls, and maybe framework or dynamic_* edges.
  4. Graph metrics: pagerank=0.013, betweenness=0.004, community_id=2, community_label="auth", cohesion=0.18.
  5. Git metadata: commit_count_90d=11, primary_owner_name="Asha", temporal_hotspot_score=2.1, churn_percentile=0.88, is_hotspot=true.
  6. Analysis rows: maybe a security finding hardcoded_secret, or a decision record from # DECISION: store sessions in Redis.
  7. Generated docs: file_page:src/auth/session.py, source hash, token counts, summary, freshness, and vector/FTS entries.
  8. Risk output: hotspot_score=0.88, trend increasing, risk type churn-heavy, co-change partners, test-gap flag, and an impact surface.