This document outlines the design for incorporating document references into the chat history system. When users reference documents via wikilinks or when the assistant retrieves documents through search, these references should be tracked, displayed to users, and made available to the LLM in a structured way.
Currently, the chat system doesn't explicitly track or display which documents are being referenced in conversations. This creates several issues:
- Users can't easily see which documents influenced an assistant's response
- Wikilinks in user messages are not processed or tracked
- Documents found via search tools are not explicitly linked to the responses they inform
- There's no persistent record of document-to-conversation relationships
- The LLM doesn't receive explicit prompting to use referenced documents
- Extract and track wikilinks from user messages
- Track documents used by the assistant via search tools
- Display references to users in a clear, consistent format
- Provide references to the LLM in a way that encourages their use
- Maintain backward compatibility with existing chat sessions
- Support document summarization for enhanced reference display
Instead of wrapping the mojentic library's ChatSession, we'll manage our own rich message structure within zk-chat. This gives us full control over the chat session data and allows us to maintain document references directly on messages for local UI purposes, while still converting to the simpler LLMMessage format when communicating with the LLM.
from typing import List, Optional
from pydantic import BaseModel
from mojentic.llm.gateways.models import LLMMessage
class DocumentReference(BaseModel):
"""
Represents a reference to a document in the vault.
This can be either a direct reference (from a wikilink) or an indirect
reference (from a search result that informed the response).
"""
wikilink: str # The original wikilink, e.g., "[[parameters reference]]"
relative_path: Optional[str] # Resolved path, None if broken link
reference_type: str # "direct" (from wikilink) or "search" (from tool result)
summary: Optional[str] = None # Optional AI-generated summary
context_snippet: Optional[str] = None # Optional context from the source
class ZkChatMessage(BaseModel):
"""
Rich message structure for zk-chat that includes document references.
This is our internal representation that maintains all the context we need
for UI display and reference tracking. We convert this to LLMMessage when
sending to the LLM.
"""
role: str # "user", "assistant", or "system"
content: str # The message content
references: List[DocumentReference] = [] # Documents referenced in this message
timestamp: Optional[str] = None # When the message was created
tool_calls: Optional[List] = None # Tool calls made (for assistant messages)
# Token accounting for context management
token_count: Optional[int] = None # Tokens in the original message content
augmented_token_count: Optional[int] = None # Tokens after adding references/formatting
tool_tokens: Optional[int] = None # Tokens used by tool calls (for assistant messages)
def to_llm_message(self) -> LLMMessage:
"""
Convert our rich message to mojentic's simpler LLMMessage format.
When converting, we may augment the content with reference information
to provide context to the LLM.
"""
from mojentic.llm.gateways.models import MessageRole
content = self.content
# For user messages with references, augment with reference list
if self.role == "user" and self.references:
ref_lines = ["\n\nReferenced Documents:"]
for ref in self.references:
if ref.relative_path:
summary = ref.summary or ""
ref_lines.append(f"- {ref.wikilink} ({ref.relative_path}): {summary}")
else:
ref_lines.append(f"- {ref.wikilink}: Warning - Document not found")
ref_lines.append("\nPlease read these documents using the read_zk_document tool if you need more information.")
content = content + "\n".join(ref_lines)
return LLMMessage(
role=MessageRole[self.role.capitalize()],
content=content,
tool_calls=self.tool_calls
)Rather than wrapping mojentic.llm.ChatSession, we'll create our own chat session manager that maintains the rich message structure and handles conversion to/from the LLM:
class ZkChatSession:
"""
Manages a chat session with rich message structures and document reference tracking.
This class maintains parallel arrays:
- messages: List[ZkChatMessage] - Our rich internal messages with references
- llm_session: ChatSession - The underlying mojentic chat session for LLM communication
The rich messages are used for local UI display and reference tracking,
while the LLM session manages the actual conversation with the model.
Token Accounting:
The session tracks token usage for all components to enable context optimization:
- Individual message token counts
- Tool description token overhead
- Tool request/response token usage
- Total context budget management
"""
def __init__(
self,
llm: LLMBroker,
system_prompt: str,
tools: List[LLMTool],
filesystem_gateway: MarkdownFilesystemGateway,
zettelkasten: Zettelkasten,
tokenizer_gateway: TokenizerGateway,
max_context: int = 32768,
temperature: float = 1.0,
# Context budget allocation (as percentages of max_context)
message_budget_ratio: float = 0.6, # 60% for conversation messages
tool_budget_ratio: float = 0.3, # 30% for tool descriptions/calls
system_budget_ratio: float = 0.1 # 10% for system prompt
):
# Our rich message history
self.messages: List[ZkChatMessage] = []
# Underlying LLM session (from mojentic)
self.llm_session = ChatSession(
llm=llm,
system_prompt=system_prompt,
tools=tools,
max_context=max_context,
temperature=temperature
)
# Dependencies for reference tracking
self.filesystem_gateway = filesystem_gateway
self.zettelkasten = zettelkasten
self.tokenizer_gateway = tokenizer_gateway
self.wikilink_pattern = re.compile(r'\[\[([^\]]+)\]\]')
# Context management
self.max_context = max_context
self.message_budget = int(max_context * message_budget_ratio)
self.tool_budget = int(max_context * tool_budget_ratio)
self.system_budget = int(max_context * system_budget_ratio)
# Token tracking
self.system_prompt_tokens = self._count_tokens(system_prompt)
self.tool_description_tokens = self._calculate_tool_tokens(tools)
self.current_message_tokens = 0
def send(self, user_message: str) -> tuple[str, ZkChatMessage, ZkChatMessage]:
"""
Send a user message and return the assistant's response.
Returns:
tuple[str, ZkChatMessage, ZkChatMessage]:
- The response text (for backward compatibility)
- The user message (with references)
- The assistant message (with references)
"""
# Extract references from user message
user_refs = self._extract_wikilinks_from_message(user_message)
# Calculate token counts for user message
user_token_count = self._count_tokens(user_message)
# Create rich user message
user_msg = ZkChatMessage(
role="user",
content=user_message,
references=user_refs,
timestamp=datetime.now().isoformat(),
token_count=user_token_count
)
# Convert to LLM message and calculate augmented token count
llm_message = user_msg.to_llm_message()
user_msg.augmented_token_count = self._count_tokens(llm_message.content)
# Add to our message history
self.messages.append(user_msg)
# Send to LLM
response_text = self.llm_session.send(llm_message.content)
# Extract references from assistant's tool usage
assistant_refs = self._extract_assistant_references()
# Calculate token counts for assistant message
assistant_token_count = self._count_tokens(response_text)
tool_calls = self._get_recent_tool_calls()
tool_token_count = self._calculate_tool_call_tokens(tool_calls) if tool_calls else 0
# Create rich assistant message
assistant_msg = ZkChatMessage(
role="assistant",
content=response_text,
references=assistant_refs,
timestamp=datetime.now().isoformat(),
tool_calls=tool_calls,
token_count=assistant_token_count,
tool_tokens=tool_token_count
)
# Add to our message history
self.messages.append(assistant_msg)
return response_text, user_msg, assistant_msg
def get_messages(self) -> List[ZkChatMessage]:
"""Get all messages with their references for UI display."""
return self.messages
def get_llm_messages(self) -> List[LLMMessage]:
"""Get simplified messages for LLM context (if needed for debugging)."""
return [msg.to_llm_message() for msg in self.messages]
def _count_tokens(self, text: str) -> int:
"""Count tokens in a text string using the tokenizer gateway."""
return self.tokenizer_gateway.count_tokens(text)
def _calculate_tool_tokens(self, tools: List[LLMTool]) -> int:
"""Calculate total tokens used by tool descriptions."""
total = 0
for tool in tools:
descriptor_text = json.dumps(tool.descriptor)
total += self._count_tokens(descriptor_text)
return total
def get_context_usage(self) -> dict:
"""
Get detailed breakdown of context token usage.
Returns:
dict with keys:
- system_tokens: Tokens used by system prompt
- tool_tokens: Tokens used by tool descriptions
- message_tokens: Tokens used by conversation messages
- total_tokens: Total tokens in context
- available_tokens: Remaining tokens before max_context
- budget_status: Usage vs allocated budgets
"""
message_tokens = sum(
(msg.augmented_token_count or msg.token_count or 0)
for msg in self.messages
)
total_tokens = (
self.system_prompt_tokens +
self.tool_description_tokens +
message_tokens
)
return {
"system_tokens": self.system_prompt_tokens,
"tool_tokens": self.tool_description_tokens,
"message_tokens": message_tokens,
"total_tokens": total_tokens,
"available_tokens": self.max_context - total_tokens,
"budget_status": {
"system": {
"used": self.system_prompt_tokens,
"budget": self.system_budget,
"percentage": (self.system_prompt_tokens / self.system_budget * 100) if self.system_budget > 0 else 0
},
"tools": {
"used": self.tool_description_tokens,
"budget": self.tool_budget,
"percentage": (self.tool_description_tokens / self.tool_budget * 100) if self.tool_budget > 0 else 0
},
"messages": {
"used": message_tokens,
"budget": self.message_budget,
"percentage": (message_tokens / self.message_budget * 100) if self.message_budget > 0 else 0
}
}
}
def _calculate_tool_call_tokens(self, tool_calls: List) -> int:
"""Calculate tokens used by tool calls (requests and responses)."""
if not tool_calls:
return 0
total = 0
for call in tool_calls:
# Tool call request
if hasattr(call, 'function'):
total += self._count_tokens(call.function.name)
total += self._count_tokens(call.function.arguments)
# Tool call response (if available)
# This would need to be tracked from the actual tool execution
return total
def should_compact_context(self) -> bool:
"""
Determine if context should be compacted based on budget usage.
Returns True if message budget is exceeded or total usage is near max_context.
"""
usage = self.get_context_usage()
# Compact if message budget exceeded
if usage["message_tokens"] > self.message_budget:
return True
# Compact if total usage is above 90% of max_context
if usage["total_tokens"] > (self.max_context * 0.9):
return True
return FalseWhen a user sends a message with wikilinks, the ZkChatSession extracts and resolves them:
def _extract_wikilinks_from_message(self, message: str) -> List[DocumentReference]:
"""
Extract wikilinks from user message and resolve them.
Example message: "Can you explain [[parameters reference]] and [[tool usage]]?"
Returns: [
DocumentReference(wikilink="[[parameters reference]]", relative_path="docs/params.md", ...),
DocumentReference(wikilink="[[tool usage]]", relative_path="docs/tools.md", ...)
]
"""
references = []
# Find all wikilinks
matches = self.wikilink_pattern.finditer(message)
for match in matches:
wikilink_text = match.group(0) # Full [[...]]
link_title = match.group(1) # Just the title
# Resolve the wikilink
try:
relative_path = self.filesystem_gateway.resolve_wikilink(wikilink_text)
# Create reference
ref = DocumentReference(
wikilink=wikilink_text,
relative_path=relative_path,
reference_type="direct",
context_snippet=None # Could extract surrounding text
)
references.append(ref)
except ValueError:
# Broken link
ref = DocumentReference(
wikilink=wikilink_text,
relative_path=None,
reference_type="direct",
context_snippet="Warning: Document not found"
)
references.append(ref)
return referencesThe references are stored directly on the ZkChatMessage object and are automatically formatted when converting to an LLM message via the to_llm_message() method shown in section 1.1.
Track when the assistant uses document-retrieval tools. This method is part of ZkChatSession:
def _extract_assistant_references(self) -> List[DocumentReference]:
"""
Extract document references from the assistant's recent tool calls.
This monitors tool calls from the underlying LLM session like:
- read_zk_document
- find_excerpts
- find_zk_documents
And creates DocumentReferences for documents that were accessed.
"""
references = []
# Get the most recent messages from the LLM session
if not self.llm_session.messages:
return references
last_message = self.llm_session.messages[-1]
# Check if message has tool calls
if not last_message.tool_calls:
return references
for tool_call in last_message.tool_calls:
tool_name = tool_call.function.name
# Handle read_zk_document
if tool_name == "read_zk_document":
args = json.loads(tool_call.function.arguments)
relative_path = args.get("relative_path")
if relative_path:
# Create wikilink from path
wikilink = self._path_to_wikilink(relative_path)
references.append(DocumentReference(
wikilink=wikilink,
relative_path=relative_path,
reference_type="search",
context_snippet="Retrieved via read_zk_document"
))
# Handle find_excerpts
elif tool_name == "find_excerpts":
# Parse tool response to get document IDs
# This requires access to tool results
excerpts = self._parse_excerpt_results(tool_call)
for excerpt in excerpts:
doc_id = excerpt.get("document_id")
if doc_id:
wikilink = self._path_to_wikilink(doc_id)
references.append(DocumentReference(
wikilink=wikilink,
relative_path=doc_id,
reference_type="search",
context_snippet=excerpt.get("text", "")[:100]
))
# Handle find_zk_documents_related_to
elif tool_name == "find_zk_documents":
documents = self._parse_document_results(tool_call)
for doc in documents:
relative_path = doc.get("relative_path")
if relative_path:
wikilink = self._path_to_wikilink(relative_path)
references.append(DocumentReference(
wikilink=wikilink,
relative_path=relative_path,
reference_type="search",
context_snippet="Found via semantic search"
))
# Deduplicate references
unique_refs = {}
for ref in references:
if ref.relative_path not in unique_refs:
unique_refs[ref.relative_path] = ref
return list(unique_refs.values())Create a service to generate concise document summaries:
class DocumentSummaryService:
"""
Generates concise summaries of documents for reference display.
"""
def __init__(self, llm: LLMBroker, zettelkasten: Zettelkasten):
self.llm = llm
self.zk = zettelkasten
self.cache = {} # Simple in-memory cache
def get_summary(self, relative_path: str, max_length: int = 100) -> str:
"""
Get or generate a summary for a document.
Uses a simple strategy:
1. Check cache
2. Try to use first paragraph of document
3. If needed, use LLM to generate summary
"""
# Check cache
if relative_path in self.cache:
return self.cache[relative_path]
# Read document
try:
document = self.zk.read_document(relative_path)
except Exception:
return "Unable to access document"
# Try first paragraph
paragraphs = document.content.split("\n\n")
if paragraphs and len(paragraphs[0]) < max_length:
summary = paragraphs[0].strip()
self.cache[relative_path] = summary
return summary
# Generate with LLM
prompt = f"Summarize this document in one sentence:\n\n{document.content[:1000]}"
summary = self.llm.complete(prompt, max_tokens=50)
self.cache[relative_path] = summary
return summaryManaging context token budgets is critical for optimal LLM performance. The ZkChatSession provides mechanisms for tracking and adapting to context constraints.
The session divides the total context window into budgets for different components:
# Example budget allocation for 32K context window:
# - System prompt: 10% (~3.2K tokens) - Instructions for the LLM
# - Tool descriptions: 30% (~9.6K tokens) - Available tools and their schemas
# - Messages: 60% (~19.2K tokens) - Conversation history
# These ratios are configurable per session
chat_session = ZkChatSession(
# ... other params ...
max_context=32768,
message_budget_ratio=0.6,
tool_budget_ratio=0.3,
system_budget_ratio=0.1
)Track real-time context usage:
# Get detailed context breakdown
usage = chat_session.get_context_usage()
print(f"System prompt: {usage['system_tokens']} tokens")
print(f"Tool descriptions: {usage['tool_tokens']} tokens")
print(f"Messages: {usage['message_tokens']} tokens")
print(f"Total: {usage['total_tokens']} / {chat_session.max_context}")
print(f"Available: {usage['available_tokens']} tokens")
# Check budget status for each component
for component, status in usage['budget_status'].items():
print(f"{component}: {status['used']}/{status['budget']} ({status['percentage']:.1f}%)")When context budgets are exceeded, several strategies can be employed:
Strategy 1: Message Compaction via Summarization
def compact_message_history(self, target_token_count: int) -> None:
"""
Compact older messages by summarizing them to fit within target token count.
Keeps recent messages intact for context continuity while summarizing
older exchanges to reduce token usage.
"""
if not self.should_compact_context():
return
# Keep the most recent N messages intact
recent_message_count = 3 # Configurable
messages_to_compact = self.messages[:-recent_message_count]
recent_messages = self.messages[-recent_message_count:]
# Summarize older messages
summary_prompt = "Summarize the following conversation, preserving key information:\n\n"
for msg in messages_to_compact:
summary_prompt += f"{msg.role}: {msg.content}\n"
summary = self.llm_session.llm.complete(summary_prompt, max_tokens=500)
# Replace old messages with summary
summary_msg = ZkChatMessage(
role="system",
content=f"[Previous conversation summary]: {summary}",
timestamp=datetime.now().isoformat(),
token_count=self._count_tokens(summary)
)
self.messages = [summary_msg] + recent_messagesStrategy 2: Tool-Use vs Context Injection
Adapt between including document content in context vs prompting tool usage:
def should_inject_document_content(self, doc_size_tokens: int) -> bool:
"""
Decide whether to inject document content into context or prompt tool usage.
If message budget has room, inject content directly for faster access.
If budget is tight, prompt LLM to use read_zk_document tool instead.
"""
usage = self.get_context_usage()
available_budget = self.message_budget - usage['message_tokens']
# Use 20% of available budget as threshold
threshold = available_budget * 0.2
if doc_size_tokens < threshold:
return True # Inject content directly
else:
return False # Prompt tool usageStrategy 3: Dynamic Tool Filtering
Reduce tool description overhead by filtering to relevant tools:
def filter_tools_by_relevance(self, user_message: str, tools: List[LLMTool]) -> List[LLMTool]:
"""
Filter tools to only include those relevant to the current query.
Reduces tool_tokens overhead by excluding tools unlikely to be used.
"""
# Use simple keyword matching or embeddings for tool relevance
relevant_tools = []
# Always include core navigation tools
core_tool_names = {"read_zk_document", "find_excerpts", "resolve_wikilink"}
for tool in tools:
tool_name = tool.descriptor["function"]["name"]
# Include core tools
if tool_name in core_tool_names:
relevant_tools.append(tool)
continue
# Include if keywords match
tool_desc = tool.descriptor["function"]["description"].lower()
if any(keyword in user_message.lower() for keyword in tool_desc.split()[:5]):
relevant_tools.append(tool)
return relevant_tools if relevant_tools else tools # Fallback to all toolsStrategy 4: Reference Storage in Smart Memory
For frequently referenced documents, store references in smart memory:
def store_document_reference_in_memory(self, doc_ref: DocumentReference) -> None:
"""
Store document reference in smart memory for future retrieval.
Instead of re-injecting full document content, LLM can retrieve
the reference from memory on demand.
"""
memory_entry = {
"type": "document_reference",
"wikilink": doc_ref.wikilink,
"path": doc_ref.relative_path,
"summary": doc_ref.summary,
"last_accessed": datetime.now().isoformat()
}
# Store in smart memory with appropriate metadata
self.smart_memory.store(
content=json.dumps(memory_entry),
metadata={"type": "reference", "path": doc_ref.relative_path}
)Display references to users in the console:
class ConsoleReferenceRenderer:
"""
Renders document references in the console output.
"""
def __init__(self, console_service: RichConsoleService):
self.console = console_service
def render_references(self, references: List[DocumentReference]) -> None:
"""
Display references in a readable format.
Output example:
📚 Referenced Documents:
• [[parameters reference]] (docs/params.md)
A detailed reference for all parameters you can send to the tool
• [[tool usage]] (docs/tools.md)
Guide for using available tools in the chat interface
"""
if not references:
return
self.console.print("\n[bold cyan]📚 Referenced Documents:[/bold cyan]")
for ref in references:
if ref.relative_path:
self.console.print(f" • [link]{ref.wikilink}[/link] ({ref.relative_path})")
if ref.summary:
self.console.print(f" [dim]{ref.summary}[/dim]")
else:
self.console.print(f" • [yellow]{ref.wikilink}[/yellow] [dim](not found)[/dim]")
self.console.print("")For the Qt GUI, references could be displayed in a side panel or as expandable sections:
class QtReferenceWidget(QWidget):
"""
Widget to display document references in the GUI.
"""
def __init__(self):
super().__init__()
self.setup_ui()
def setup_ui(self):
layout = QVBoxLayout()
# Title
title = QLabel("Referenced Documents")
title.setStyleSheet("font-weight: bold; font-size: 14px;")
layout.addWidget(title)
# List widget for references
self.ref_list = QListWidget()
self.ref_list.itemClicked.connect(self.on_reference_clicked)
layout.addWidget(self.ref_list)
self.setLayout(layout)
def update_references(self, references: List[DocumentReference]):
"""Update the displayed references."""
self.ref_list.clear()
for ref in references:
if ref.relative_path:
item_text = f"{ref.wikilink}\n{ref.summary or ref.relative_path}"
else:
item_text = f"{ref.wikilink}\n(Document not found)"
item = QListWidgetItem(item_text)
item.setData(Qt.UserRole, ref)
self.ref_list.addItem(item)
def on_reference_clicked(self, item):
"""Handle reference click - could open document in viewer."""
ref = item.data(Qt.UserRole)
if ref.relative_path:
# Emit signal or trigger document viewer
self.document_opened.emit(ref.relative_path)Modify zk_chat/chat.py to use our new ZkChatSession:
def chat(config: Config, unsafe: bool = False, use_git: bool = False, store_prompt: bool = False):
# ... existing setup code for llm, zk, filesystem_gateway, tools ...
# Create our rich chat session instead of mojentic's ChatSession directly
chat_session = ZkChatSession(
llm=llm,
system_prompt=system_prompt,
tools=tools,
filesystem_gateway=filesystem_gateway,
zettelkasten=zk,
max_context=32768,
temperature=1.0
)
# Reference renderer for displaying references
ref_renderer = ConsoleReferenceRenderer(console_service)
while True:
query = console_service.input("[chat.user]Query:[/] ")
if not query:
console_service.print("[chat.system]Exiting...[/]")
break
else:
# Send message - returns response text and both messages with references
response, user_msg, assistant_msg = chat_session.send(query)
# Display response
console_service.print(f"[chat.assistant]{response}[/]")
# Display references if any were used by the assistant
if assistant_msg.references:
ref_renderer.render_references(assistant_msg.references)
# Optionally, also show user references if they were resolved
# (could be shown inline or as a separate section)Similar integration for zk_chat/agent.py:
def agent(config: Config):
# ... existing setup code ...
# Create ZkChatSession for the agent instead of basic ChatSession
chat_session = ZkChatSession(
llm=llm,
system_prompt=system_prompt,
tools=tools,
filesystem_gateway=filesystem_gateway,
zettelkasten=zk
)
# The agent can now access rich message history with references
problem_solver = IterativeProblemSolvingAgent(
chat_session=chat_session, # Pass our rich session
# ... other parameters ...
)
# ... rest of agent code ...For the Qt GUI, we can access the full message history with references:
class ChatWindow(QMainWindow):
def __init__(self):
super().__init__()
self.chat_session = None # Will be ZkChatSession
self.ref_widget = QtReferenceWidget()
def send_message(self, user_text: str):
"""Send a message and update the UI."""
# Send and get rich messages back
response, user_msg, assistant_msg = self.chat_session.send(user_text)
# Display messages in chat history
self.add_message_to_ui(user_msg)
self.add_message_to_ui(assistant_msg)
# Update reference widget with assistant references
if assistant_msg.references:
self.ref_widget.update_references(assistant_msg.references)
def add_message_to_ui(self, message: ZkChatMessage):
"""Add a message with its references to the UI."""
# Display message content
self.chat_display.append_message(message.role, message.content)
# Optionally display inline references
if message.references:
ref_summary = ", ".join([ref.wikilink for ref in message.references])
self.chat_display.append_metadata(f"📚 References: {ref_summary}")For advanced use cases, we can persist entire chat sessions with references:
class ChatSessionStore:
"""
Optional: Store chat sessions with their rich message history.
"""
def __init__(self, vault_path: str):
self.db_path = os.path.join(vault_path, ".zk_chat_db", "chat_sessions.json")
def save_session(self, session_id: str, chat_session: ZkChatSession):
"""Save a chat session with all its messages and references."""
# Load existing data
data = self._load_data()
# Convert messages to serializable format
messages_data = [msg.model_dump() for msg in chat_session.messages]
# Update with new session
data[session_id] = {
"messages": messages_data,
"created_at": datetime.now().isoformat(),
"last_updated": datetime.now().isoformat()
}
# Save
self._save_data(data)
def load_session(self, session_id: str) -> Optional[List[ZkChatMessage]]:
"""Retrieve a chat session's message history."""
data = self._load_data()
session_data = data.get(session_id)
if session_data:
messages = [ZkChatMessage.model_validate(msg) for msg in session_data["messages"]]
return messages
return None
def _load_data(self) -> dict:
"""Load session data from disk."""
if os.path.exists(self.db_path):
with open(self.db_path, 'r') as f:
return json.load(f)
return {}
def _save_data(self, data: dict):
"""Save session data to disk."""
os.makedirs(os.path.dirname(self.db_path), exist_ok=True)
with open(self.db_path, 'w') as f:
json.dump(data, f, indent=2)- Create
DocumentReferencemodel inzk_chat/models.py - Create
ZkChatMessagemodel with references array and token accounting fields - Create
ZkChatSessionclass to manage rich message history - Add
TokenizerGatewayintegration for token counting - Implement budget allocation system (message/tool/system ratios)
- Implement wikilink extraction from user messages
- Add basic tests for reference extraction
- Add tests for token counting and budget tracking
- Implement tool call monitoring for
read_zk_document - Implement tool call monitoring for
find_excerpts - Implement tool call monitoring for
find_zk_documents_related_to - Add tool call token counting
- Add tests for reference detection from tool calls
- Create
DocumentSummaryService - Implement first-paragraph summary extraction
- Implement LLM-based summary generation (optional)
- Add caching for summaries
- Add tests for summary service
- Implement
ZkChatMessage.to_llm_message()conversion with reference formatting - Calculate and store token counts for original and augmented messages
- Test that reference lists are properly appended to LLM messages
- Test that LLM responds appropriately to reference prompts
- Validate token count accuracy
- Implement
get_context_usage()for real-time monitoring - Implement
should_compact_context()decision logic - Add context compaction via message summarization
- Implement adaptive tool filtering
- Implement content injection vs tool-use decision making
- Add tests for all context management strategies
- Create
ConsoleReferenceRenderer - Integrate with
chat.pymain loop - Integrate with
agent.pymain loop - Add optional context usage display in UI
- Add tests for rendering
- Create
QtReferenceWidget - Integrate with Qt GUI
- Add reference click handling
- Add context usage visualization
- Add tests for GUI components
- Create
ChatSessionStorefor persistent storage of sessions - Implement save/load for
ZkChatSessionwith token data - Add tests for persistence
- Add comprehensive documentation
- Create usage examples for token management
- Update README with reference tracking and context management features
- Add troubleshooting guide for context issues
- Summary Caching: Cache document summaries to avoid repeated LLM calls
- Lazy Loading: Only generate summaries when needed for display
- Reference Deduplication: Avoid tracking the same document multiple times in one message
- Token Counting: Use efficient tokenizer for accurate token accounting without performance penalties
- Real-time Tracking: Track token usage for every message and tool call
- Budget Allocation: Configurable budgets for system prompt, tools, and messages
- Context Monitoring: Real-time visibility into context usage and budget status
- Automatic Compaction: Trigger context compaction when budgets are exceeded
- Adaptive Strategies:
- Message summarization for older conversations
- Tool filtering based on relevance
- Dynamic choice between content injection vs tool usage
- Smart memory integration for frequently referenced documents
- Broken Links: Handle wikilinks that don't resolve gracefully
- Tool Call Parsing: Robust parsing of tool results to extract document references
- Missing Documents: Handle cases where referenced documents are deleted
- Token Calculation Failures: Fallback to character-based estimation if tokenizer fails
- Optional Feature: Reference tracking should be optional and not break existing code
- Graceful Degradation: If reference tracking fails, chat should continue normally
- Existing Sessions: Support for sessions without reference metadata
- Token Fields Optional: Token accounting fields are optional to support legacy messages
- Unobtrusive Display: References should enhance, not clutter the interface
- Actionable References: Users should be able to click/navigate to referenced documents
- Clear Feedback: Make it clear when a wikilink is broken or a reference is unavailable
- Context Visibility: Optionally display context usage to help users understand token limits
- Test wikilink extraction from various message formats
- Test reference deduplication logic
- Test summary generation and caching
- Test tool call parsing for different tool types
- Test token counting for messages, tools, and references
- Test budget allocation and tracking
- Test context usage calculation
- Test should_compact_context() decision logic
- Test full flow: user query with wikilinks → augmented prompt → LLM response
- Test assistant reference detection from tool usage
- Test reference display in console
- Test reference display in GUI
- Test token accounting across multi-turn conversations
- Test context compaction when budgets are exceeded
- Test adaptive tool filtering based on relevance
- Test content injection vs tool-use decision making
- Empty messages
- Messages with only wikilinks
- Broken wikilinks
- Multiple references to same document
- Tool calls that return no results
- Large numbers of references (>10)
- Messages that exceed token budgets
- Sessions approaching max_context limit
- Tool descriptions that change dynamically
- Reference Analytics: Track which documents are most frequently referenced
- Smart Suggestions: Suggest related documents based on conversation context
- Reference Graphs: Visualize document relationships discovered through chat
- Export with References: Export conversations with embedded reference metadata
- Cross-Session References: Track document usage across multiple chat sessions
- Reference Highlighting: Highlight referenced text in source documents
- Advanced Context Management:
- Machine learning-based context prioritization
- Automatic identification of key messages for preservation
- Dynamic budget reallocation based on conversation patterns
- Predictive tool loading based on conversation context
- Performance Optimization:
- Incremental token counting (cache and update deltas)
- Parallel tokenization for large messages
- Token estimation fallbacks for speed-critical paths
- Multi-Model Support:
- Per-model token counting strategies
- Model-specific context window optimization
- Adaptive strategies based on model capabilities
- Should we automatically read documents when they're wikilinked, or just prompt the LLM to do so?
- How many references should we show before truncating the display?
- Should document summaries be stored permanently or regenerated each time?
- How should we handle references in multi-turn conversations?
- Should we track negative references (documents that were searched but not used)?
- Token Budget Allocation: What are the optimal default ratios for system/tool/message budgets across different use cases?
- Compaction Triggers: At what budget usage percentage should we trigger automatic compaction?
- Compaction Strategy: Should we use summarization, pruning, or a hybrid approach for context compaction?
- Tool Filtering: How aggressive should tool filtering be? Should it be user-configurable?
- Token Estimation: For models without native tokenizers, what estimation heuristics work best?
mojentic.llm.ChatSession- Underlying LLM communication (we'll use internally)mojentic.llm.gateways.models.LLMMessage- Simple message format for LLMmojentic.llm.gateways.tokenizer_gateway.TokenizerGateway- Token counting for context managementMarkdownFilesystemGateway- Wikilink resolutionZettelkasten- Document accessRichConsoleService- Console outputLLMBroker- Summary generation and LLM interactionSmartMemory- Reference storage and retrieval
DocumentReferencemodel - Represents a document reference with metadataZkChatMessagemodel - Rich message with references array and token accountingZkChatSessionclass - Manages chat with rich message history and context budgetsDocumentSummaryService- Generates document summariesConsoleReferenceRenderer- Displays references in consoleQtReferenceWidget(optional) - GUI reference displayChatSessionStore(optional) - Persistent storage for chat sessions
- Token counting integrated into every message
- Budget allocation system for context components
- Context usage monitoring and reporting
- Adaptive strategies for context management
Since this is a new feature with no existing data to migrate:
- Phase 1: Implement core
ZkChatSessionalongside existingChatSessionusage - Phase 2: Migrate
chat.pyto useZkChatSession - Phase 3: Migrate
agent.pyto useZkChatSession - Phase 4: Migrate GUI to use
ZkChatSession - Phase 5: Remove old
ChatSessiondirect usage once all modules migrated - Phase 6: Consider persistence for advanced users
- Users can see which documents are referenced in their queries
- Users can see which documents informed assistant responses
- Document summaries provide helpful context without requiring clicks
- Reference display doesn't clutter the chat interface
- Feature works seamlessly in both CLI and GUI modes
- All tests pass with >90% code coverage
- Documentation is clear and includes examples
- Performance impact is minimal (<100ms overhead per message)
- Token accounting accurately tracks context usage across all components
- Budget allocation prevents context overflow and maintains optimal LLM performance
- Context compaction strategies effectively manage long conversations without losing critical information
- Adaptive strategies (tool filtering, content injection decisions) improve context efficiency
- Users can monitor context usage and understand when compaction occurs
- Token counting overhead is negligible (<10ms per message)