Name	Name	Last commit message	Last commit date
parent directory ..
src	src
README.md	README.md

Document Processing

Overview

The Document Processing use case automates the classification, data extraction, and compliance validation of financial documents. It handles loan applications, KYC documents, financial statements, regulatory filings, and contracts -- classifying each by type and jurisdiction, extracting structured fields, and validating against regulatory rules to surface compliance issues.

Business Value

Accelerated intake -- documents classified and fields extracted in seconds, reducing manual processing backlogs
Compliance assurance -- automated validation against regulatory rules catches missing fields and inconsistencies before human review
Structured output -- unstructured documents transformed into standardized key-value pairs for downstream systems
Jurisdiction awareness -- classification includes regulatory framework mapping for multi-jurisdictional operations
Reduced error rates -- parallel agent cross-checks improve extraction accuracy over single-pass approaches

Architecture

graph TB
    Request["Client Request"] --> Runtime["AgentCore Runtime"]
    Runtime --> Orchestrator["Orchestrator"]
    Orchestrator --> Classifier["Document Classifier<br/><small>Type, jurisdiction & regulatory classification</small>"]
    Orchestrator --> Extractor["Data Extractor<br/><small>Structured data extraction</small>"]
    Orchestrator --> Validator["Validation Agent<br/><small>Compliance rule validation</small>"]
    Classifier --> Bedrock["Amazon Bedrock<br/>(Claude)"]
    Extractor --> Bedrock
    Validator --> Bedrock
    Classifier --> S3["S3 Sample Data"]
    Extractor --> S3
    Validator --> S3
    Classifier --> Synthesis["Result Synthesis"]
    Extractor --> Synthesis
    Validator --> Synthesis
    Synthesis --> Response["Response"]

Directory Structure

use_cases/document_processing/
├── README.md
└── src/
    └── strands/
        ├── __init__.py
        ├── config.py          # DocumentProcessingSettings
        ├── models.py          # Pydantic request/response models
        ├── orchestrator.py    # DocumentProcessingOrchestrator + run_document_processing()
        └── agents/
            ├── __init__.py
            ├── document_classifier.py
            ├── data_extractor.py
            └── validation_agent.py

Agentic Design

The orchestrator uses a parallel fan-out pattern. In full mode, all three agents run concurrently via asyncio.gather. Targeted modes (classification_only, extraction_only, validation_only) invoke a single agent. The orchestrator synthesizes combined results into a structured JSON summary with classification, extraction completeness, and validation status.

Agents

Agent	Role	Data Used	Output
Document Classifier	Classifies documents by type (loan application, KYC, financial statement, regulatory filing, contract), identifies jurisdiction and regulatory relevance	Document profile via `s3_retriever_tool`	Document type with confidence score, jurisdiction, applicable regulations
Data Extractor	Extracts structured data from unstructured financial documents including entities, amounts, dates, and reference numbers	Document profile via `s3_retriever_tool`	Key-value fields, named entities, financial amounts, dates, completeness assessment
Validation Agent	Validates extracted data against regulatory rules and business logic, cross-references with existing records	Document profile via `s3_retriever_tool`	Validation status (VALID/INVALID/REVIEW_REQUIRED), passed/failed checks, notes

Data and Tools

Tool: s3_retriever_tool -- retrieves document profiles and content from S3
S3 data prefix: samples/document_processing/
Model: Claude Sonnet (via Amazon Bedrock), temperature 0.1, max 8192 tokens
Config thresholds: classification_confidence_threshold=0.85, extraction_accuracy_threshold=0.90, max_document_size_mb=50

Request / Response

Request -- ProcessingRequest:

Field	Type	Description
`document_id`	`str`	Document identifier (e.g., `DOC001`)
`processing_type`	`ProcessingType`	`full`, `classification_only`, `extraction_only`, `validation_only`
`additional_context`	`str \| None`	Optional context

Response -- ProcessingResponse:

Field	Type	Description
`document_id`	`str`	Document identifier
`processing_id`	`str`	Unique processing UUID
`timestamp`	`datetime`	Processing timestamp
`classification`	`DocumentClassification \| None`	Document type, confidence, jurisdiction, regulatory relevance
`extracted_data`	`ExtractedData \| None`	Fields dict, entities, amounts, dates
`validation_result`	`ValidationResult \| None`	Status, checks passed/failed, notes
`summary`	`str`	Processing summary
`raw_analysis`	`dict`	Raw agent output

Quick Start

# Deploy to AgentCore
USE_CASE_ID=document_processing ./scripts/deploy/full/deploy_agentcore.sh

# Test the deployment
./scripts/use_cases/document_processing/test/test_agentcore.sh

Sample Data

Located at data/samples/document_processing/

Document ID	Type	Description
DOC001	Loan Application	Commercial loan application for Acme Corporation Ltd -- 12-page PDF, $5M business expansion loan, 5-year term, US jurisdiction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Document Processing

Overview

Business Value

Architecture

Directory Structure

Agentic Design

Agents

Data and Tools

Request / Response

Quick Start

Sample Data

Related Documentation

FilesExpand file tree

document_processing

Directory actions

More options

Directory actions

More options

Latest commit

History

document_processing

Folders and files

parent directory

README.md

Document Processing

Overview

Business Value

Architecture

Directory Structure

Agentic Design

Agents

Data and Tools

Request / Response

Quick Start

Sample Data

Related Documentation