This document describes the context-aware multimodal processing feature in RAGAnything, which provides surrounding content information to LLMs when analyzing images, tables, equations, and other multimodal content for enhanced accuracy and relevance.
The context-aware feature enables RAGAnything to automatically extract and provide surrounding text content as context when processing multimodal content. This leads to more accurate and contextually relevant analysis by giving AI models additional information about where the content appears in the document structure.
- Enhanced Accuracy: Context helps AI understand the purpose and meaning of multimodal content
- Semantic Coherence: Generated descriptions align with document context and terminology
- Automated Integration: Context extraction is automatically enabled during document processing
- Flexible Configuration: Multiple extraction modes and filtering options
- Integrated Configuration: Complete context options in
RAGAnythingConfig - Environment Variables: Configure all context parameters via environment variables
- Dynamic Updates: Runtime configuration updates supported
- Content Format Control: Configurable content source format detection
- Auto-Initialization: Modal processors automatically receive tokenizer and context configuration
- Content Source Setup: Document processing automatically sets content sources for context extraction
- Position Information: Automatic position info (page_idx, index) passed to processors
- Batch Processing: Context-aware batch processing for efficient document handling
- Accurate Token Counting: Uses LightRAG's tokenizer for precise token calculation
- Smart Boundary Preservation: Truncates at sentence/paragraph boundaries
- Backward Compatibility: Fallback to character truncation when tokenizer unavailable
- Multiple Formats: Support for MinerU, plain text, custom formats
- Flexible Modes: Page-based and chunk-based context extraction
- Content Filtering: Configurable content type filtering
- Header Support: Optional inclusion of document headers and structure
# Context Extraction Configuration
context_window: int = 1 # Context window size (pages/chunks)
context_mode: str = "page" # Context mode ("page" or "chunk")
max_context_tokens: int = 2000 # Maximum context tokens
include_headers: bool = True # Include document headers
include_captions: bool = True # Include image/table captions
context_filter_content_types: List[str] = ["text"] # Content types to include
content_format: str = "minerU" # Default content format for context extraction# Context extraction settings
CONTEXT_WINDOW=2
CONTEXT_MODE=page
MAX_CONTEXT_TOKENS=3000
INCLUDE_HEADERS=true
INCLUDE_CAPTIONS=true
CONTEXT_FILTER_CONTENT_TYPES=text,image
CONTENT_FORMAT=minerUfrom raganything import RAGAnything, RAGAnythingConfig
# Create configuration with context settings
config = RAGAnythingConfig(
context_window=2,
context_mode="page",
max_context_tokens=3000,
include_headers=True,
include_captions=True,
context_filter_content_types=["text", "image"],
content_format="minerU"
)
# Create RAGAnything instance
rag_anything = RAGAnything(
config=config,
llm_model_func=your_llm_function,
embedding_func=your_embedding_function
)# Context is automatically enabled during document processing
await rag_anything.process_document_complete("document.pdf")# Set content source for specific content lists
rag_anything.set_content_source_for_context(content_list, "minerU")
# Update context configuration at runtime
rag_anything.update_context_config(
context_window=1,
max_context_tokens=1500,
include_captions=False
)from raganything.modalprocessors import (
ContextExtractor,
ContextConfig,
ImageModalProcessor
)
# Configure context extraction
config = ContextConfig(
context_window=1,
context_mode="page",
max_context_tokens=2000,
include_headers=True,
include_captions=True,
filter_content_types=["text"]
)
# Initialize context extractor
context_extractor = ContextExtractor(config)
# Initialize modal processor with context support
processor = ImageModalProcessor(lightrag, caption_func, context_extractor)
# Set content source
processor.set_content_source(content_list, "minerU")
# Process with context
item_info = {
"page_idx": 2,
"index": 5,
"type": "image"
}
result = await processor.process_multimodal_content(
modal_content=image_data,
content_type="image",
file_path="document.pdf",
entity_name="Architecture Diagram",
item_info=item_info
)- Extracts context based on page boundaries
- Uses
page_idxfield from content items - Suitable for document-structured content
- Example: Include text from 2 pages before and after current image
- Extracts context based on content item positions
- Uses sequential position in content list
- Suitable for fine-grained control
- Example: Include 5 content items before and after current table
Document Input → MinerU Parsing → content_list Generation
content_list → Set as Context Source → All Modal Processors Gain Context Capability
Multimodal Content → Extract Surrounding Context → Enhanced LLM Analysis → More Accurate Results
[
{
"type": "text",
"text": "Document content here...",
"text_level": 1,
"page_idx": 0
},
{
"type": "image",
"img_path": "images/figure1.jpg",
"img_caption": ["Figure 1: Architecture"],
"page_idx": 1
}
]text_chunks = [
"First chunk of text content...",
"Second chunk of text content...",
"Third chunk of text content..."
]full_document = "Complete document text with all content..."For focused analysis with minimal context:
config = RAGAnythingConfig(
context_window=1,
context_mode="page",
max_context_tokens=1000,
include_headers=True,
include_captions=False,
context_filter_content_types=["text"]
)For broad analysis with rich context:
config = RAGAnythingConfig(
context_window=2,
context_mode="page",
max_context_tokens=3000,
include_headers=True,
include_captions=True,
context_filter_content_types=["text", "image", "table"]
)For fine-grained sequential context:
config = RAGAnythingConfig(
context_window=5,
context_mode="chunk",
max_context_tokens=2000,
include_headers=False,
include_captions=False,
context_filter_content_types=["text"]
)- Uses real tokenizer for precise token counting
- Avoids exceeding LLM token limits
- Provides consistent performance
- Truncates at sentence boundaries
- Maintains semantic integrity
- Adds truncation indicators
- Context extraction results can be reused
- Reduces redundant computation overhead
The system automatically truncates context to fit within token limits:
- Uses actual tokenizer for accurate token counting
- Attempts to end at sentence boundaries (periods)
- Falls back to line boundaries if needed
- Adds "..." indicator for truncated content
When include_headers=True, headers are formatted with markdown-style prefixes:
# Level 1 Header
## Level 2 Header
### Level 3 Header
When include_captions=True, image and table captions are included as:
[Image: Figure 1 caption text]
[Table: Table 1 caption text]
The context-aware feature is seamlessly integrated into RAGAnything's workflow:
- Automatic Setup: Context extractors are automatically created and configured
- Content Source Management: Document processing automatically sets content sources
- Processor Integration: All modal processors receive context capabilities
- Configuration Consistency: Single configuration system for all context settings
The system includes robust error handling:
- Gracefully handles missing or invalid content sources
- Returns empty context for unsupported formats
- Logs warnings for configuration issues
- Continues processing even if context extraction fails
- Backward Compatible: Existing code works without modification
- Optional Feature: Context can be selectively enabled/disabled
- Flexible Configuration: Supports multiple configuration combinations
- Token Limits: Ensure
max_context_tokensdoesn't exceed LLM context limits - Performance Impact: Larger context windows increase processing time
- Content Quality: Context quality directly affects analysis accuracy
- Window Size: Match window size to content structure (documents vs articles)
- Content Filtering: Use
context_filter_content_typesto reduce noise
Context Not Extracted
- Check if
set_content_source_for_context()was called - Verify
item_infocontains required fields (page_idx,index) - Confirm content source format is correct
Context Too Long/Short
- Adjust
max_context_tokenssetting - Modify
context_windowsize - Check
context_filter_content_typesconfiguration
Irrelevant Context
- Refine
context_filter_content_typesto exclude noise - Reduce
context_windowsize - Set
include_captions=Falseif captions are not helpful
Configuration Issues
- Verify environment variables are set correctly
- Check RAGAnythingConfig parameter names
- Ensure content_format matches your data source
Check out these example files for complete usage demonstrations:
- Configuration Examples: See how to set up different context configurations
- Integration Examples: Learn how to integrate context-aware processing into your workflow
- Custom Processors: Examples of creating custom modal processors with context support
For detailed API documentation, see the docstrings in:
raganything/modalprocessors.py- Context extraction and modal processorsraganything/config.py- Configuration optionsraganything/raganything.py- Main RAGAnything class integration