Skip to content

a2aproject/a2a-tck

Repository files navigation

A2A Protocol Technology Compatibility Kit (TCK)

A comprehensive test suite for validating A2A (Application-to-Application) JSON-RPC protocol specification compliance with progressive validation and detailed compliance reporting.

Overview

The A2A Protocol TCK is a sophisticated validation framework that provides:

  • πŸ“‹ Categorized Testing: Clear separation of mandatory vs. optional requirements
  • 🎯 Capability-Based Validation: Smart test execution based on Agent Card declarations
  • πŸ“Š Compliance Reporting: Detailed assessment with actionable recommendations
  • πŸš€ Progressive Enhancement: Four-tier compliance levels for informed deployment decisions

The TCK transforms A2A specification compliance from guesswork into a clear, structured validation process.

πŸ”„ Two Main Workflows

1. Testing Your A2A Implementation (You're likely here for this)

Use the TCK to validate your A2A implementation:

./run_tck.py --sut-url http://localhost:9999 --category all --compliance-report report.json

2. Managing A2A Specification Updates (Advanced/Maintainer workflow)

Use the TCK to validate your A2A implementation:

  • πŸ“– Complete Specification Update Workflow

  • πŸ” Check spec changes: util_scripts/check_spec_changes.py

  • πŸ“₯ Update baseline: util_scripts/update_current_spec.py --version "v1.x"


✨ Key Features

πŸ” Intelligent Test Categorization

  • πŸ”΄ MANDATORY: Must pass for A2A compliance (JSON-RPC 2.0 + A2A core)
  • πŸ”„ CAPABILITIES: Conditional mandatory based on Agent Card declarations
  • πŸ›‘οΈ QUALITY: Production readiness indicators (optional)
  • 🎨 FEATURES: Optional implementation completeness (informational)

🧠 Capability-Based Test Logic

  • Smart Execution: Tests skip when capabilities not declared, become mandatory when declared
  • False Advertising Detection: Catches capabilities declared but not implemented
  • Honest Validation: Only tests what's actually claimed to be supported

πŸ“ˆ Compliance Levels & Scoring

  • πŸ”΄ NON_COMPLIANT: Any mandatory failure (Not A2A Compliant)
  • 🟑 MANDATORY: Basic compliance (A2A Core Compliant)
  • 🟒 RECOMMENDED: Production-ready (A2A Recommended Compliant)
  • πŸ† FULL_FEATURED: Complete implementation (A2A Fully Compliant)

πŸ“‹ Comprehensive Reporting

  • Weighted compliance scoring
  • Specification reference citations
  • Actionable fix recommendations
  • Deployment readiness guidance

Requirements

  • Python: 3.8+
  • uv: Recommended for environment management
  • SUT: Running A2A implementation with accessible HTTP/HTTPS endpoint

Installation

  1. Install uv:

    # Install uv (see https://github.com/astral-sh/uv#installation)
    curl -LsSf https://astral.sh/uv/install.sh | sh
    # Or: pipx install uv
    # Or: brew install uv
  2. Clone and setup:

    git clone https://github.com/maeste/a2a-tck.git
    cd a2a-tck
    
    # Create virtual environment
    uv venv
    source .venv/bin/activate  # Linux/macOS
    # .venv\\Scripts\\activate   # Windows
    
    # Install dependencies
    uv pip install -e .
  3. Configure environment (optional):

    # Copy example environment file and customize
    cp .env.example .env
    # Edit .env to set timeout values and other configuration
  4. Start your A2A implementation (System Under Test):

    # Example using the included Python SUT
    cd python-sut/tck_core_agent
    uv run .

Preparing and Running Your SUT (System Under Test) with run_sut.py

Note: The run_sut.py script requires the PyYAML package. You can install it using uv pip install pyyaml or pip install pyyaml.

To simplify the process of testing various A2A implementations, this TCK includes a utility script run_sut.py. This Python script automates the download (or update), build, and execution of a System Under Test (SUT) based on a configuration file.

SUTs will be cloned or updated into a directory named SUT/ created in the root of this TCK repository.

Configuration (sut_config.yaml)

You need to create a YAML configuration file (e.g., my_sut_config.yaml) to define how your SUT should be handled. A template is available at sut_config_template.yaml.

The configuration file supports the following fields:

  • sut_name (string, mandatory): A descriptive name for your SUT. This name will also be used as the directory name for the SUT within the SUT/ folder (e.g., SUT/my_agent).
  • github_repo (string, mandatory): The HTTPS or SSH URL of the git repository where the SUT source code is hosted.
  • git_ref (string, optional): A specific git branch, tag, or commit hash to checkout after cloning/fetching. If omitted, the repository's default branch will be used.
  • prerequisites_script (string, mandatory): Path to the script that handles prerequisite installation and building the SUT. This path is relative to the root of the SUT's cloned repository (e.g., scripts/build.sh or setup/prepare_env.py).
  • prerequisites_interpreter (string, optional): The interpreter to use for the prerequisites_script (e.g., bash, python3, powershell.exe). If omitted, the script will be executed directly (e.g., ./scripts/build.sh). Ensure the script is executable and has a valid shebang in this case.
  • prerequisites_args (string, optional): A string of arguments to pass to the prerequisites_script (e.g., "--version 1.2 --no-cache").
  • run_script (string, mandatory): Path to the script that starts the SUT. This path is relative to the root of the SUT's cloned repository (e.g., scripts/run.sh or app/start_server.py).
  • run_interpreter (string, optional): The interpreter to use for the run_script.
  • run_args (string, optional): A string of arguments to pass to the run_script (e.g., "--port 8080 --debug").

Example sut_config.yaml:

sut_name: "example_agent"
github_repo: "https://github.com/your_org/example_agent_repo.git"
git_ref: "v1.0.0" # Optional: checkout tag v1.0.0
prerequisites_script: "bin/setup.sh"
prerequisites_interpreter: "bash"
prerequisites_args: "--fast"
run_script: "bin/start.py"
run_interpreter: "python3"
run_args: "--host 0.0.0.0 --port 9000"

SUT Script Requirements

  • Prerequisites Script: This script is responsible for all steps required to build your SUT and install its dependencies. It should exit with a status code of 0 on success and any non-zero status code on failure. If it fails, run_sut.py will terminate.
  • Run Script: This script should start your SUT. Typically, it will launch a server or application that runs in the foreground. The run_sut.py script will wait for this script to terminate (e.g., by Ctrl+C or if the SUT exits itself).
  • Directly Executable Scripts: If you omit the *_interpreter for a script, ensure the script file has execute permissions (e.g., chmod +x your_script.sh) and, for shell scripts on Unix-like systems, includes a valid shebang (e.g., #!/bin/bash).

Usage

Once you have your SUT configuration file ready, you can run your SUT using:

python run_sut.py path/to/your_sut_config.yaml

For example:

python run_sut.py sut_configs/my_python_agent_config.yaml

This will:

  1. Clone the SUT from github_repo into SUT/<sut_name>/ (or update if it already exists).
  2. Checkout the specified git_ref (if any).
  3. Execute the prerequisites_script within the SUT's directory.
  4. Execute the run_script within the SUT's directory to start the SUT.

You can then proceed to run the TCK tests against your SUT.

πŸ“‹ SUT Requirements

Before running tests, ensure your A2A implementation meets the SUT Requirements. This includes:

  • Streaming Duration: Tasks with message IDs starting with "test-resubscribe-message-id" must run for β‰₯ 2 Γ— TCK_STREAMING_TIMEOUT seconds
  • Environment Variables: Optional support for TCK_STREAMING_TIMEOUT configuration
  • Test Patterns: Proper handling of TCK-specific message ID patterns

πŸ“– Read Full SUT Requirements β†’

πŸš€ Quick Start

1. Check A2A Compliance (Start Here!)

./run_tck.py --sut-url http://localhost:9999 --category mandatory

Result: βœ… Pass = A2A compliant, ❌ Fail = NOT A2A compliant

2. Validate Capability Honesty

./run_tck.py --sut-url http://localhost:9999 --category capabilities

Result: Ensures declared capabilities actually work (prevents false advertising)

3. Assess Production Readiness

./run_tck.py --sut-url http://localhost:9999 --category quality

Result: Identifies issues that may affect production deployment

4. Generate Comprehensive Report

./run_tck.py --sut-url http://localhost:9999 --category all --compliance-report compliance.json

Result: Complete assessment with compliance level and recommendations

πŸ“– Command Reference

Core Commands

# Get help and understand test categories
./run_tck.py --explain

# Test specific category
./run_tck.py --sut-url URL --category CATEGORY

# Available categories:
#   mandatory    - A2A compliance validation (MUST pass)  
#   capabilities - Capability honesty check (conditional mandatory)
#   quality      - Production readiness assessment
#   features     - Optional feature completeness
#   all          - Complete validation workflow

Advanced Options

# Generate detailed compliance report
./run_tck.py --sut-url URL --category all --compliance-report report.json

# Verbose output with detailed logging
./run_tck.py --sut-url URL --category mandatory --verbose

# Generate HTML report (additional)
./run_tck.py --sut-url URL --category all --report

# Skip Agent Card fetching (for non-standard implementations)  
./run_tck.py --sut-url URL --category mandatory --skip-agent-card

βš™οΈ Environment Configuration

Using Environment Variables

The TCK supports configuration via environment variables and .env files for flexible timeout and behavior customization.

Setting up environment configuration:

# Copy the example file
cp .env.example .env

# Edit the file to customize settings
nano .env  # or your preferred editor

Available environment variables:

Variable Description Default Examples
TCK_STREAMING_TIMEOUT Base timeout for SSE streaming tests (seconds) 2.0 1.0 (fast), 5.0 (slow), 10.0 (debug)

Timeout behavior:

  • Short timeout: TCK_STREAMING_TIMEOUT * 0.5 - Used for basic streaming operations
  • Normal timeout: TCK_STREAMING_TIMEOUT * 1.0 - Used for standard SSE client operations
  • Async timeout: TCK_STREAMING_TIMEOUT * 1.0 - Used for asyncio.wait_for operations

Usage examples:

# Use .env file (recommended)
echo "TCK_STREAMING_TIMEOUT=5.0" > .env
./run_tck.py --sut-url URL --category capabilities

# Set directly for single run
TCK_STREAMING_TIMEOUT=1.0 ./run_tck.py --sut-url URL --category capabilities

# Debug with very slow timeouts
TCK_STREAMING_TIMEOUT=30.0 ./run_tck.py --sut-url URL --category capabilities --verbose

When to adjust timeouts:

  • Decrease (1.0): Fast CI/CD pipelines, local development
  • Increase (5.0+): Slow networks, debugging, resource-constrained environments
  • Debug (10.0+): Detailed troubleshooting, step-through debugging

🎯 Understanding Test Categories

πŸ”΄ MANDATORY Tests - Core A2A Compliance

Purpose: Validate core A2A specification requirements
Impact: Failure = NOT A2A compliant
Location: tests/mandatory/

Includes:

  • JSON-RPC 2.0 compliance (tests/mandatory/jsonrpc/)
  • A2A protocol core methods (tests/mandatory/protocol/)
  • Agent Card required fields
  • Core message/send functionality
  • Task management (get/cancel)

Example Failures:

  • test_task_history_length β†’ SDK doesn't implement historyLength parameter
  • test_mandatory_fields_present β†’ Agent Card missing required fields

πŸ”„ CAPABILITY Tests - Conditional Mandatory

Purpose: Validate declared capabilities work correctly
Impact: Failure = False advertising
Logic: Skip if not declared, mandatory if declared
Location: tests/optional/capabilities/

Capability Validation:

{
  "capabilities": {
    "streaming": true,         ← Must pass streaming tests
    "pushNotifications": false ← Streaming tests will skip
  }
}

Includes:

  • Streaming support (message/stream, tasks/resubscribe)
  • Push notification configuration
  • File/data modality support
  • Authentication methods

πŸ›‘οΈ QUALITY Tests - Production Readiness

Purpose: Assess implementation robustness
Impact: Never blocks compliance, indicates production issues
Location: tests/optional/quality/

Quality Areas:

  • Concurrent request handling
  • Edge case robustness
  • Unicode/special character support
  • Boundary value handling
  • Error recovery and resilience

🎨 FEATURE Tests - Optional Implementation

Purpose: Measure optional feature completeness
Impact: Purely informational
Location: tests/optional/features/

Includes:

  • Convenience features
  • Enhanced error messages
  • SDK-specific capabilities
  • Optional protocol extensions

πŸ“Š Compliance Levels

πŸ”΄ NON_COMPLIANT - Not A2A Compliant

  • Criteria: Any mandatory test failure
  • Business Impact: Cannot be used for A2A integrations
  • Action: Fix mandatory failures immediately

🟑 MANDATORY - A2A Core Compliant

  • Criteria: 100% mandatory test pass rate
  • Business Impact: Basic A2A integration support
  • Suitable For: Development and testing environments
  • Next Step: Address capability validation

🟒 RECOMMENDED - A2A Recommended Compliant

  • Criteria: Mandatory (100%) + Capability (β‰₯85%) + Quality (β‰₯75%)
  • Business Impact: Production-ready with confidence
  • Suitable For: Staging and careful production deployment
  • Next Step: Enhance feature completeness

πŸ† FULL_FEATURED - A2A Fully Compliant

  • Criteria: Capability (β‰₯95%) + Quality (β‰₯90%) + Feature (β‰₯80%)
  • Business Impact: Complete A2A implementation
  • Suitable For: Full production deployment with confidence

πŸ“‹ Compliance Report

When you run with --compliance-report, you get a JSON report containing:

{
  "summary": {
    "compliance_level": "RECOMMENDED",
    "overall_score": 87.5,
    "mandatory_score": 100.0,
    "capability_score": 90.0,
    "quality_score": 75.0,
    "feature_score": 60.0
  },
  "recommendations": [
    "βœ… Ready for staging deployment",
    "⚠️ Address 2 quality issues before production",
    "πŸ’‘ Consider implementing 3 additional features"
  ],
  "next_steps": [
    "Fix Unicode handling in task storage",
    "Improve concurrent request performance",
    "Consider implementing authentication capability"
  ]
}

πŸ”„ CI/CD Integration

Basic CI Pipeline (Compliance Gate)

#!/bin/bash
# Block deployment if not A2A compliant
./run_tck.py --sut-url $SUT_URL --category mandatory
if [ $? -ne 0 ]; then
    echo "❌ NOT A2A compliant - blocking deployment"
    exit 1
fi
echo "βœ… A2A compliant - deployment approved"

Advanced CI Pipeline (Environment-Aware)

#!/bin/bash
# Generate compliance report and make environment-specific decisions
./run_tck.py --sut-url $SUT_URL --category all --compliance-report compliance.json

COMPLIANCE_LEVEL=$(jq -r '.summary.compliance_level' compliance.json)

case $COMPLIANCE_LEVEL in
    "NON_COMPLIANT")
        echo "❌ Not A2A compliant - blocking all deployments"
        exit 1
        ;;
    "MANDATORY")
        echo "🟑 Basic compliance - dev/test only"
        [[ "$ENVIRONMENT" == "production" ]] && exit 1
        ;;
    "RECOMMENDED")
        echo "🟒 Recommended - staging approved"
        ;;
    "FULL_FEATURED")
        echo "πŸ† Full compliance - production approved"
        ;;
esac

πŸ› οΈ Troubleshooting

Common Issues

Streaming tests skipping:

# Check Agent Card capabilities
curl $SUT_URL/.well-known/agent.json | jq .capabilities
# If streaming: false, tests will skip (this is correct!)

Quality tests failing but compliance achieved:

# This is expected - quality tests don't block compliance
# Address quality issues for production readiness

Tests not discovering:

# Ensure proper installation
pip install -e .

# Check test discovery
pytest --collect-only tests/mandatory/

Running Individual Tests for Debugging

When debugging specific test failures, you can run individual tests with detailed output:

Run a single test with verbose output and debug information:

# Using run_tck.py with verbose mode (shows print() and logger.info() messages)
python run_tck.py --sut-url http://localhost:9999 --category capabilities --verbose-log

# Run specific test directly with pytest
python -m pytest tests/optional/capabilities/test_streaming_methods.py::test_message_stream_basic \
    --sut-url http://localhost:9999 -s -v --log-cli-level=INFO

Run all tests in a specific file:

python -m pytest tests/optional/capabilities/test_streaming_methods.py \
    --sut-url http://localhost:9999 -s -v --log-cli-level=INFO

Debug options explained:

  • -s: Shows print() statements during test execution
  • -v: Verbose test output with detailed test names and outcomes
  • --log-cli-level=INFO: Shows logger.info() and other log messages
  • --tb=short: Shorter traceback format (default in run_tck.py)

Run with different log levels:

# Show DEBUG level logs (very detailed)
python -m pytest tests/path/to/test.py --sut-url URL -s -v --log-cli-level=DEBUG

# Show only WARNING and ERROR logs
python -m pytest tests/path/to/test.py --sut-url URL -s -v --log-cli-level=WARNING

πŸ“š Documentation

🀝 Contributing

  1. Fork the repository
  2. Follow Test Documentation Standards
  3. Add tests with proper categorization and specification references
  4. Submit pull request with clear specification citations

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


🎯 Quick Decision Guide

Just want A2A compliance?

./run_tck.py --sut-url URL --category mandatory

Planning production deployment?

./run_tck.py --sut-url URL --category all --compliance-report report.json

Debugging capability issues?

./run_tck.py --sut-url URL --category capabilities --verbose

Want comprehensive assessment?

./run_tck.py --sut-url URL --explain  # Learn about categories first
./run_tck.py --sut-url URL --category all --compliance-report full_report.json

The A2A TCK transforms specification compliance from confusion into clarity. πŸš€

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5