A minuscule terminology service, powered by LLMs, designed to extract, validate, and enrich user-defined topic definitions sourced from Wikipedia—perfectly packaged for developers and AI enthusiasts.
AI-powered web service (built with FastAPI) that creates and manages a dictionary of financial terms. It automatically fetches definitions (primarily from Wikipedia with a user-defined topic-focused heuristics), uses Large Language Models (LLMs via litellm and instructor) to generate explanatory follow-up questions and validate definition accuracy, and can extract financial terms from text. New entries undergo a candidate review process before being added to the official terminus, ensuring quality control. The system uses SQLAlchemy for database persistence (defaulting to SQLite) and provides Docker support for easy deployment.
Quick Start Commands
- Run Locally (using Uvicorn): (Ensure you have created a .env file with necessary configurations, like LLM API keys)
# Set up environment variables
export $(cat .env | xargs)
# Install dependencies (using uv is recommended)
uv sync
# Run the FastAPI application
uvicorn terminus.app:app --host 0.0.0.0 --port 8000 --reload
NB: Alternatively, use a .env loader (like python-dotenv) or export variables manually depending on your shell.
- Run with Docker Compose: (Ensure you have created a .env file in the project root)
# Build and start the service defined in docker-compose.yml
docker-compose up --build -d
Access the API at http://localhost:8000/docs
Table of Contents:
- AI-Powered Terminology Service
- TL;DR
The terminus project is an asynchronous web service designed to build, manage, and serve a curated dictionary of (financial and economic) terms. It leverages Large Language Models and Wikipedia to automatically generate, refine, and validate definitions and related concepts, ensuring a degree of quality control through a candidate review process.
Core Objectives:
- Provide clear, concise, and factually validated definitions for financial terms.
- Generate contextually relevant follow-up questions to deepen user understanding.
- Identify and extract financial terms from unstructured text.
- Implement a workflow for reviewing and approving automatically generated or externally sourced term definitions before they become part of the official terminus.
- Offer a robust API for programmatic access to the terminus.
Key Features:
- Automated Definition Generation: Utilizes Wikipedia and validate them using LLMs to source initial definitions.
- LLM-Powered Follow-up Generation: Creates insightful follow-up questions based on the definition's content using
instructor
andlitellm
. - LLM-Based Validation: Use LLMs to critique the financial relevance of terms and validate the factual accuracy of definitions within a financial context.
- Candidate Workflow: Implements a two-stage system (
candidate_terminus
andterminus
tables) where new entries are held for review before being promoted to the official, validated terminus. - Financial Term Extraction: Identifies potential financial terms within a given text block using LLM-based Named Entity Recognition (NER) followed by a critique step.
- Asynchronous API: Built with FastAPI for high-performance, non-blocking I/O operations.
- Database Persistence: Uses SQLAlchemy for ORM and database interaction (defaulting to SQLite).
- Containerized Deployment: Provides
Dockerfile
anddocker-compose.yml
for easy setup and deployment.
The service operates primarily through API endpoints, orchestrating interactions between the database, LLM services, and the Wikipedia service.
This is the primary user-facing endpoint for retrieving a term's definition. The logic follows a specific hierarchy to ensure quality and efficiency:
- Check Official terminus: The system first queries the
terminus
table (viaTerminusService
) for an existing, validated entry matching the requestedterm
(case-insensitive). If found, this validated entry (terminusAnswer
) is returned directly. - Check Candidate terminus: If the term is not in the official terminus, the system checks the
candidate_terminus
table (viaCandidateTerminusService
).- If a candidate entry exists, its details (
CandidateterminusAnswer
, including status like "under_review" or "rejected") might be returned (the exact logic for returning candidates vs. generating new ones might need refinement based on desired UX).
- If a candidate entry exists, its details (
- Generate New Candidate (if necessary): If the term is found in neither table, or if regeneration is triggered:
- Fetch Definition: The
WikipediaService
is queried asynchronously to find the most relevant, user-defined topic-focused (e.g. finance, physics) summary for the term. This service employs specific strategies:- Searching for
"{term} (user-defined topic)"
. - Standard Wikipedia search, prioritizing results containing financial keywords.
- Handling disambiguation pages by preferring user-defined topic)-related options.
- Falling back to a search including a context hint (
finance economics...
).
- Searching for
- Generate Follow-ups: The fetched definition (or potentially a user-provided one via
terminusEntryCreate
) is passed to theFUService
(LLM). This service uses a specific prompt (FOLLOWUP_SYSTEM_MESSAGE
,FOLLOWUP_USER_MESSAGE_TEMPLATE
) and theterminusAnswer
Pydantic model (viainstructor
) to generate a list ofFollowUp
questions based on sub-terms found within the definition. - Definition Validation: Includes a
DefinitionValidationService
. This service is intended to be called here or before saving to candidates, using its specific LLM prompt (VALIDATION_SYSTEM_MESSAGE
,VALIDATION_USER_MESSAGE_TEMPLATE
) and theDefinitionValidationResult
Pydantic model to assess the fetched/generated definition's factual accuracy and assign a confidence score. - Save as Candidate: The term, fetched/generated definition, generated follow-ups, and initial status ("under_review") are saved to the
candidate_terminus
table usingCandidateTerminusService
. - Return Candidate: The newly created candidate entry details (
CandidateterminusAnswer
) are returned to the user.
- Fetch Definition: The
This endpoint identifies financial terms within a given block of text:
- Initial Extraction: The input text is passed to the
FinancialTermExtractionService
. A LLM call is made using a prompt focused on extracting potential financial/economic terms, structured according to theExtractedTerms
Pydantic model. - Critique/Validation: Each extracted term is then individually subjected to a second LLM call within the same service (
_critique_term
method). This step uses a different prompt (critique_system_message
,critique_user_message_template
) and theTermCritique
Pydantic model. The LLM acts as a domain expert to determine if the term is genuinely relevant to the user-defined topic. - Return Validated Terms: Only the terms that pass the critique step (i.e.,
is_relevant
is true in theTermCritique
response) are returned to the user as a list of strings.
These endpoints facilitate the review workflow:
- Get Candidate (
/candidate/{term}
): Retrieves the details of a specific candidate entry (CandidateterminusAnswer
) from thecandidate_terminus
table. - Validate Candidate (
/candidate/validate
): This is the crucial human-in-the-loop or automated approval step.- Input: Takes a
CandidateValidation
payload (term, approve flag, reason). - Logic:
- If
approve
isTrue
:- Retrieve the candidate entry data (
get_dict
fromCandidateTerminusService
is used here, likely to detach the object from the session before manipulating across services). - Save the data (term, definition, follow-ups) to the official
terminus
table usingTerminusService
. - Delete the entry from the
candidate_terminus
table usingCandidateTerminusService
.
- Retrieve the candidate entry data (
- If
approve
isFalse
:- Update the status of the entry in the
candidate_terminus
table to "rejected" along with the providedreason
usingCandidateTerminusService.reject
.
- Update the status of the entry in the
- If
- Return: Confirmation message.
- Input: Takes a
- Web Framework: FastAPI (for asynchronous API development)
- Data Validation/Serialization: Pydantic (used extensively for API models, LLM response structures, and settings)
- Database ORM: SQLAlchemy (for defining models and interacting with the database)
- Database Driver (Default):
sqlite-aiosqlite
(for async SQLite access) - LLM Interaction:
instructor
: For reliable structured output (Pydantic models) from LLMs.litellm
: To interact with various LLM providers (e.g., Gemini viagemini/gemini-2.0-flash
) through a unified interface.
- Wikipedia Access:
wikipedia
library (wrapped for asynchronous execution). - Configuration:
pydantic-settings
(for managing settings via environment variables and.env
files). - Dependency Management:
uv
(orpip
) withpyproject.toml
anduv.lock
. - Logging:
loguru
(configured inapp.py
). - Containerization: Docker, Docker Compose.
- ORM: SQLAlchemy Core and ORM features are used.
- Engine/Session:
database.py
configures the asynchronous SQLAlchemy engine (create_async_engine
) and session factory (async_sessionmaker
). Theget_session
dependency provider ensures each API request gets a dedicated session that is closed afterward. - Models: Defined in
models.py
:terminusEntry
: Represents validated entries in theterminus
table (term [PK], definition, follow_ups [JSON Text]).CandidateterminusEntry
: Represents entries awaiting review in thecandidate_terminus
table (term [PK], definition, follow_ups [JSON], status [String]).
- Storage: Defaults to a persistent SQLite database (
./volumes/sqlite_data/terminus.db
) managed via Docker volumes.DATABASE_URL
in.env
can configure other SQLAlchemy-compatible databases. - Schema Management:
Base.metadata.create_all(bind=engine)
indatabase.py
provides a basic mechanism for table creation during development. Note: For production, a dedicated migration tool like Alembic is strongly recommended but not currently implemented. - Serialization: Follow-up questions (
FollowUp
Pydantic models) are serialized to a JSON string for storage in the database (_serialize_follow_ups
) and deserialized back into Pydantic objects upon retrieval (_deserialize_follow_ups
) within theTerminusService
andCandidateTerminusService
.
- Framework: FastAPI.
- Structure: Endpoints are organized into routers (
routers/candidate.py
,routers/definition.py
,routers/terms.py
) which are included in the mainapp.py
. - Asynchronicity: Uses
async def
extensively for non-blocking request handling, essential for waiting on database, Wikipedia, and LLM I/O. - Validation: Pydantic models defined in
schemas.py
are used for automatic request body validation and response serialization. Type hints are used throughout for clarity and static analysis. - Dependency Injection: FastAPI's dependency injection system is used, notably for providing database sessions (
Depends(get_session)
). Services (TerminusService
,WikipediaService
, LLM services) are instantiated within endpoint functions, often receiving the injected session. - Documentation: Automatic interactive API documentation is available at
/docs
(Swagger UI) and/redoc
(ReDoc) provided by FastAPI.
- Abstraction:
litellm
provides a common interface (acompletion
) to different LLM APIs (defaulting togemini/gemini-2.0-flash
). - Structured Output:
instructor.from_litellm(acompletion)
patches the LiteLLM client to enforce responses conforming to specified Pydantic models (response_model
parameter in services). This significantly improves reliability. - Service Layer: Logic for interacting with LLMs is encapsulated in dedicated service classes (
services/llm_service.py
):BaseLLMService
: Abstract base class handling client initialization, message formatting (build_messages
), and basic error handling during the LLM call (generate_response
).FUService
: GeneratesterminusAnswer
(specifically thefollow_ups
part) based on a term and definition.DefinitionValidationService
: GeneratesDefinitionValidationResult
to assess definition quality.FinancialTermExtractionService
: Performs two-step extraction and critique usingExtractedTerms
andTermCritique
models.
- Prompt Engineering: System and user message templates are stored centrally (
prompts.py
) and formatted within the respective services, clearly defining the LLM's task and context.
- Service:
WikipediaService
encapsulates all logic for fetching summaries. - Asynchronicity: The blocking
wikipedia
library calls (wikipedia.summary
,wikipedia.search
,wikipedia.page
) are wrapped usingasyncio.to_thread
to avoid blocking the FastAPI event loop. - Topic Focus: Implements heuristics to prioritize user-defined topic-related articles:
- Checks for explicit
(user-defined topic)
suffix. - Scans search results and disambiguation options for financial keywords using regex (
topic_pattern
). - Uses a context hint in fallback searches.
- Checks for explicit
- Error Handling: Explicitly handles
wikipedia.exceptions.DisambiguationError
andwikipedia.exceptions.PageError
.
- Mechanism: Uses
pydantic-settings
. TheSettings
class inconfig.py
defines expected configuration variables. - Source: Settings are loaded from environment variables or a
.env
file. - Variables:
DATABASE_URL
: SQLAlchemy database connection string (default:sqlite+aiosqlite:///./volumes/sqlite_data/terminus.db
).LOG_LEVEL
: Logging level for the application (default:INFO
).litellm
might require provider-specific API keys (e.g.,GEMINI_API_KEY
) set as environment variables depending on the chosen model.
- Dockerfile: Defines the image for the Python application, including installing dependencies using
uv
and setting the entry point to runuvicorn
. - docker-compose.yml: Orchestrates the application service (
terminus_app
) and potentially related services (though only the app is defined here). It maps ports (8000:8000
), mounts the source code (./:/app
), and defines a named volume (sqlite_data
) to persist the SQLite database file outside the container filesystem. It also specifies the.env
file for configuration.
The application follows a standard layered architecture pattern:
- Presentation Layer (API): Handles HTTP requests, routes them to appropriate handlers, performs data validation (via Pydantic), and serializes responses. This is implemented using FastAPI Routers (
routers/
). - Service Layer (Business Logic): Contains the core application logic, orchestrating tasks like database interaction, calling external services (LLM, Wikipedia), and implementing workflows (e.g., candidate validation). This is implemented in the
services/
directory (TerminusService
,CandidateTerminusService
,WikipediaService
, LLM Services). - Data Access Layer: Responsible for interacting with the database. This includes the SQLAlchemy models (
models.py
), database session management (database.py
), and the ORM queries performed within the Service Layer. - External Services: Integrations with third-party APIs (LLM providers via
litellm
, Wikipedia API viawikipedia
library).
This separation promotes modularity, testability, and maintainability.
graph TD
subgraph Frontend
User[User]
end
subgraph API Layer
Router[Definition Router FastAPI]
end
subgraph Services
LS[TerminusService]
CLS[CandidateTerminusService]
WS[WikipediaService]
FS[FollowUpService LLM]
DVS[DefValidationService LLM]
end
subgraph External APIs
WIKI[Wikipedia API]
LLM[LLM API]
end
subgraph Databases
ODB[(Official terminus DB)]
CDB[(Candidate terminus DB)]
end
User --> Router
Router --> LS
LS --> ODB
Router --> CLS
CLS --> CDB
Router --> WS
WS --> WIKI
WS --> Router
Router --> FS
FS --> LLM
Router --> DVS
DVS --> LLM
CLS --> CDB
Router --> User
Conceptual Diagram, the sequence diagram wasn't readable enough.
The entire application is built around Python's asyncio
framework, facilitated by FastAPI:
- API endpoints are defined with
async def
. - Database interactions use an asynchronous SQLAlchemy driver (
aiosqlite
) andawait
. - LLM calls via
litellm
(acompletion
) are asynchronous. - Blocking Wikipedia calls are executed in separate threads using
asyncio.to_thread
to prevent blocking the main event loop.
This ensures the service can handle concurrent requests efficiently, especially when waiting for external I/O operations.
Several mechanisms are implemented to ensure the quality, relevance, and accuracy of the terminus data:
- Candidate Review Workflow: The most significant guard rail. New or automatically generated entries must pass through the
candidate_terminus
table and require explicit approval (/candidate/validate
) before being promoted to the officialterminus
. This allows for human oversight or more sophisticated automated checks. - LLM-Powered Term Relevance Critique: The
FinancialTermExtractionService
doesn't just extract terms; it uses a secondary LLM call (_critique_term
) specifically to validate whether an extracted term is genuinely related to the user-defined topic, reducing noise. - LLM-Powered Definition Validation: The
DefinitionValidationService
uses an LLM prompt focused on factual accuracy within the financial domain, providing a structured assessment (DefinitionValidationResult
includingis_valid
,confidence
,reasoning
) of generated or fetched definitions. - Structured LLM Output: Using
instructor
forces LLM responses into predefined Pydantic models. This prevents malformed or unexpected free-form text, ensuring downstream code receives data in the expected format. If the LLM fails to conform,instructor
typically raises an error or allows for retries (depending on configuration, though basic retry isn't explicitly shown here). - Wikipedia User-Defined Topic Prioritization: The
WikipediaService
actively tries to find user-defined topic-specific articles, reducing the chance of retrieving definitions for unrelated concepts with the same name (e.g., "bond" the chemical vs. "bond" the financial instrument). - API Input/Output Validation: Pydantic models used in FastAPI endpoints automatically validate incoming request data and ensure outgoing responses adhere to the defined schema.
- Type Hinting: Extensive use of Python type hints improves code clarity and allows for static analysis tools (like MyPy) to catch potential type errors early.
- Logging: Detailed logging (
loguru
) provides visibility into the system's operations, helping diagnose errors and understand decision-making processes (e.g., why a specific Wikipedia page was chosen).
While functional, the current implementation has areas for improvement and inherent limitations:
- LLM Reliability:
- Hallucination/Accuracy: LLMs can still generate plausible but incorrect information (hallucinations). The
DefinitionValidationService
mitigates but doesn't eliminate this risk. Confidence scores are subjective to the LLM's assessment. - Prompt Sensitivity: The quality of LLM outputs (extraction, follow-ups, validation) is highly dependent on the specific prompts used and the chosen LLM model. Changes in models might require prompt adjustments.
- Bias: LLMs can inherit biases from their training data, potentially affecting definitions or follow-up questions.
- Hallucination/Accuracy: LLMs can still generate plausible but incorrect information (hallucinations). The
- Wikipedia Service Limitations:
- Summarization Quality: Wikipedia summaries (
sentences=2
) can sometimes be too brief, too complex, or miss crucial nuances. - Disambiguation Imperfection: The user-defined topic keyword heuristic might fail for terms where the financial meaning isn't obvious from the title or for genuinely ambiguous cases.
- Vandalism/Accuracy: Wikipedia content itself can occasionally be inaccurate or subject to vandalism, although popular articles are generally well-maintained.
- Summarization Quality: Wikipedia summaries (
- Scalability:
- Database: SQLite is simple for development but has limitations under high concurrent write loads. Migrating to PostgreSQL or another production-grade database would be necessary for scaling.
- External API Dependencies: Heavy reliance on external LLM and Wikipedia APIs introduces potential bottlenecks related to rate limits, latency, cost, and availability. Caching strategies could help.
- Validation Robustness:
- The LLM-based validation is a good step, but could be enhanced (e.g., cross-referencing with multiple sources, more sophisticated fact-checking techniques, multi-agent debate).
- The current candidate approval is binary. A more granular review process might be needed.
- Cold Start Problem: An empty terminus requires significant initial effort (manual or automated runs) to populate candidate terms and get them reviewed.
- Lack of UI: The review process currently relies on direct API calls. A simple web interface for reviewers would significantly improve usability.
- Testing Coverage: While the structure supports testing, comprehensive unit, integration, and end-to-end tests are crucial but not explicitly provided. Testing LLM interactions effectively requires specific strategies (mocking, snapshot testing, evaluation sets).
- Migration Management: No database migration tool (like Alembic) is included, making schema changes in production environments risky.
create_all_tables
is unsuitable for production. - Error Handling Granularity: Some error handling could be more specific, providing clearer feedback to the user or client system about why an operation failed (e.g., LLM API key missing vs. content moderation block).
While not fully implemented, the system could incorporate automated evaluation mechanisms:
- Candidate Approval Rate: Track the percentage of candidate terms that are approved versus rejected. A high rejection rate might indicate issues with the generation (Wikipedia fetch) or validation (LLM) steps.
- LLM Validation Confidence Monitoring: Analyze the average confidence scores provided by the
DefinitionValidationService
. Consistently low scores might signal problems with the definitions being generated or the validator LLM itself. - Semantic Similarity to Golden Set: Maintain a "golden set" of high-quality, human-verified terms and definitions. Periodically, compare newly approved terminus entries against this set using semantic similarity metrics (e.g., sentence-transformer embeddings and cosine similarity) to detect semantic drift or quality degradation.
- Consistency Checks:
- Periodically re-run the
DefinitionValidationService
on existing official terminus entries to catch potential regressions or identify definitions that have become outdated. - Check for contradictions between a term's definition and the definitions of its follow-up terms.
- Periodically re-run the
- A/B Testing Prompts/Models: Implement infrastructure to test different LLM prompts or models for generation, extraction, or validation tasks, comparing their performance based on metrics like approval rates, confidence scores, or semantic similarity scores.
- User Feedback Loop: If user interaction is added, incorporate feedback mechanisms (e.g., rating definitions, reporting errors) as a direct measure of quality.
- Python 3.13+
uv
(recommended, high-performance Python package installer and resolver) orpip
- Docker and Docker Compose (for containerized execution)
- Access to an LLM API compatible with
litellm
(e.g., Google AI Studio for Gemini API key, free-tier is OK for testing).
- Clone the Repository:
git clone <your-repository-url> cd terminus
- Create
.env
File: Create a file named.env
in the project root directory and add the following, adjusting as needed:Ensure# .env DATABASE_URL=sqlite+aiosqlite:///./volumes/sqlite_data/terminus.db LOG_LEVEL=INFO # Add LLM API Key if required by litellm for your chosen provider # Example for Gemini: GEMINI_API_KEY=your_gemini_api_key_here # User defined topic and anchor list of keywords TOPIC_DOMAIN=finance TOPIC_KEYWORDS=["finance", "financial", "banking", "investment", "economic", "stock", "market", "derivative"]
litellm
knows how to pick up the key, or configure it according tolitellm
documentation if necessary.
Using uv
(recommended):
uv sync
Note: For production, implement and use a database migration tool like Alembic.
uvicorn terminus.app:app --host 0.0.0.0 --port 8000 --reload
(The --reload
flag is useful for development)
The API documentation will be available at http://localhost:8000/docs
.
This is the recommended way to run the application, especially for consistency across environments.
- Build and Start Containers:
(Use
docker-compose up --build
-d
to run in detached mode) - Accessing the Service: The API will be available at
http://localhost:8000
. Documentation athttp://localhost:8000/docs
. - Stopping Containers:
(Add
docker-compose down
-v
to remove the named volumesqlite_data
if you want to clear the database)