A comprehensive test suite for validating A2A (Application-to-Application) JSON-RPC protocol specification compliance with progressive validation and detailed compliance reporting.
The A2A Protocol TCK is a sophisticated validation framework that provides:
- π Categorized Testing: Clear separation of mandatory vs. optional requirements
- π― Capability-Based Validation: Smart test execution based on Agent Card declarations
- π Compliance Reporting: Detailed assessment with actionable recommendations
- π Progressive Enhancement: Four-tier compliance levels for informed deployment decisions
The TCK transforms A2A specification compliance from guesswork into a clear, structured validation process.
Use the TCK to validate your A2A implementation:
./run_tck.py --sut-url http://localhost:9999 --category all --compliance-report report.json
Use the TCK to validate your A2A implementation:
-
π Check spec changes:
util_scripts/check_spec_changes.py
-
π₯ Update baseline:
util_scripts/update_current_spec.py --version "v1.x"
- π΄ MANDATORY: Must pass for A2A compliance (JSON-RPC 2.0 + A2A core)
- π CAPABILITIES: Conditional mandatory based on Agent Card declarations
- π‘οΈ QUALITY: Production readiness indicators (optional)
- π¨ FEATURES: Optional implementation completeness (informational)
- Smart Execution: Tests skip when capabilities not declared, become mandatory when declared
- False Advertising Detection: Catches capabilities declared but not implemented
- Honest Validation: Only tests what's actually claimed to be supported
- π΄ NON_COMPLIANT: Any mandatory failure (Not A2A Compliant)
- π‘ MANDATORY: Basic compliance (A2A Core Compliant)
- π’ RECOMMENDED: Production-ready (A2A Recommended Compliant)
- π FULL_FEATURED: Complete implementation (A2A Fully Compliant)
- Weighted compliance scoring
- Specification reference citations
- Actionable fix recommendations
- Deployment readiness guidance
- Python: 3.8+
- uv: Recommended for environment management
- SUT: Running A2A implementation with accessible HTTP/HTTPS endpoint
-
Install uv:
# Install uv (see https://github.com/astral-sh/uv#installation) curl -LsSf https://astral.sh/uv/install.sh | sh # Or: pipx install uv # Or: brew install uv
-
Clone and setup:
git clone https://github.com/maeste/a2a-tck.git cd a2a-tck # Create virtual environment uv venv source .venv/bin/activate # Linux/macOS # .venv\\Scripts\\activate # Windows # Install dependencies uv pip install -e .
-
Configure environment (optional):
# Copy example environment file and customize cp .env.example .env # Edit .env to set timeout values and other configuration
-
Start your A2A implementation (System Under Test):
# Example using the included Python SUT cd python-sut/tck_core_agent uv run .
Note: The run_sut.py
script requires the PyYAML
package. You can install it using uv pip install pyyaml
or pip install pyyaml
.
To simplify the process of testing various A2A implementations, this TCK includes a utility script run_sut.py
. This Python script automates the download (or update), build, and execution of a System Under Test (SUT) based on a configuration file.
SUTs will be cloned or updated into a directory named SUT/
created in the root of this TCK repository.
You need to create a YAML configuration file (e.g., my_sut_config.yaml
) to define how your SUT should be handled. A template is available at sut_config_template.yaml
.
The configuration file supports the following fields:
sut_name
(string, mandatory): A descriptive name for your SUT. This name will also be used as the directory name for the SUT within theSUT/
folder (e.g.,SUT/my_agent
).github_repo
(string, mandatory): The HTTPS or SSH URL of the git repository where the SUT source code is hosted.git_ref
(string, optional): A specific git branch, tag, or commit hash to checkout after cloning/fetching. If omitted, the repository's default branch will be used.prerequisites_script
(string, mandatory): Path to the script that handles prerequisite installation and building the SUT. This path is relative to the root of the SUT's cloned repository (e.g.,scripts/build.sh
orsetup/prepare_env.py
).prerequisites_interpreter
(string, optional): The interpreter to use for theprerequisites_script
(e.g.,bash
,python3
,powershell.exe
). If omitted, the script will be executed directly (e.g.,./scripts/build.sh
). Ensure the script is executable and has a valid shebang in this case.prerequisites_args
(string, optional): A string of arguments to pass to theprerequisites_script
(e.g.,"--version 1.2 --no-cache"
).run_script
(string, mandatory): Path to the script that starts the SUT. This path is relative to the root of the SUT's cloned repository (e.g.,scripts/run.sh
orapp/start_server.py
).run_interpreter
(string, optional): The interpreter to use for therun_script
.run_args
(string, optional): A string of arguments to pass to therun_script
(e.g.,"--port 8080 --debug"
).
Example sut_config.yaml
:
sut_name: "example_agent"
github_repo: "https://github.com/your_org/example_agent_repo.git"
git_ref: "v1.0.0" # Optional: checkout tag v1.0.0
prerequisites_script: "bin/setup.sh"
prerequisites_interpreter: "bash"
prerequisites_args: "--fast"
run_script: "bin/start.py"
run_interpreter: "python3"
run_args: "--host 0.0.0.0 --port 9000"
- Prerequisites Script: This script is responsible for all steps required to build your SUT and install its dependencies. It should exit with a status code of
0
on success and any non-zero status code on failure. If it fails,run_sut.py
will terminate. - Run Script: This script should start your SUT. Typically, it will launch a server or application that runs in the foreground. The
run_sut.py
script will wait for this script to terminate (e.g., by Ctrl+C or if the SUT exits itself). - Directly Executable Scripts: If you omit the
*_interpreter
for a script, ensure the script file has execute permissions (e.g.,chmod +x your_script.sh
) and, for shell scripts on Unix-like systems, includes a valid shebang (e.g.,#!/bin/bash
).
Once you have your SUT configuration file ready, you can run your SUT using:
python run_sut.py path/to/your_sut_config.yaml
For example:
python run_sut.py sut_configs/my_python_agent_config.yaml
This will:
- Clone the SUT from
github_repo
intoSUT/<sut_name>/
(or update if it already exists). - Checkout the specified
git_ref
(if any). - Execute the
prerequisites_script
within the SUT's directory. - Execute the
run_script
within the SUT's directory to start the SUT.
You can then proceed to run the TCK tests against your SUT.
Before running tests, ensure your A2A implementation meets the SUT Requirements. This includes:
- Streaming Duration: Tasks with message IDs starting with
"test-resubscribe-message-id"
must run for β₯2 Γ TCK_STREAMING_TIMEOUT
seconds - Environment Variables: Optional support for
TCK_STREAMING_TIMEOUT
configuration - Test Patterns: Proper handling of TCK-specific message ID patterns
π Read Full SUT Requirements β
./run_tck.py --sut-url http://localhost:9999 --category mandatory
Result: β Pass = A2A compliant, β Fail = NOT A2A compliant
./run_tck.py --sut-url http://localhost:9999 --category capabilities
Result: Ensures declared capabilities actually work (prevents false advertising)
./run_tck.py --sut-url http://localhost:9999 --category quality
Result: Identifies issues that may affect production deployment
./run_tck.py --sut-url http://localhost:9999 --category all --compliance-report compliance.json
Result: Complete assessment with compliance level and recommendations
# Get help and understand test categories
./run_tck.py --explain
# Test specific category
./run_tck.py --sut-url URL --category CATEGORY
# Available categories:
# mandatory - A2A compliance validation (MUST pass)
# capabilities - Capability honesty check (conditional mandatory)
# quality - Production readiness assessment
# features - Optional feature completeness
# all - Complete validation workflow
# Generate detailed compliance report
./run_tck.py --sut-url URL --category all --compliance-report report.json
# Verbose output with detailed logging
./run_tck.py --sut-url URL --category mandatory --verbose
# Generate HTML report (additional)
./run_tck.py --sut-url URL --category all --report
# Skip Agent Card fetching (for non-standard implementations)
./run_tck.py --sut-url URL --category mandatory --skip-agent-card
The TCK supports configuration via environment variables and .env
files for flexible timeout and behavior customization.
Setting up environment configuration:
# Copy the example file
cp .env.example .env
# Edit the file to customize settings
nano .env # or your preferred editor
Available environment variables:
Variable | Description | Default | Examples |
---|---|---|---|
TCK_STREAMING_TIMEOUT |
Base timeout for SSE streaming tests (seconds) | 2.0 |
1.0 (fast), 5.0 (slow), 10.0 (debug) |
Timeout behavior:
- Short timeout:
TCK_STREAMING_TIMEOUT * 0.5
- Used for basic streaming operations - Normal timeout:
TCK_STREAMING_TIMEOUT * 1.0
- Used for standard SSE client operations - Async timeout:
TCK_STREAMING_TIMEOUT * 1.0
- Used forasyncio.wait_for
operations
Usage examples:
# Use .env file (recommended)
echo "TCK_STREAMING_TIMEOUT=5.0" > .env
./run_tck.py --sut-url URL --category capabilities
# Set directly for single run
TCK_STREAMING_TIMEOUT=1.0 ./run_tck.py --sut-url URL --category capabilities
# Debug with very slow timeouts
TCK_STREAMING_TIMEOUT=30.0 ./run_tck.py --sut-url URL --category capabilities --verbose
When to adjust timeouts:
- Decrease (
1.0
): Fast CI/CD pipelines, local development - Increase (
5.0+
): Slow networks, debugging, resource-constrained environments - Debug (
10.0+
): Detailed troubleshooting, step-through debugging
Purpose: Validate core A2A specification requirements
Impact: Failure = NOT A2A compliant
Location: tests/mandatory/
Includes:
- JSON-RPC 2.0 compliance (
tests/mandatory/jsonrpc/
) - A2A protocol core methods (
tests/mandatory/protocol/
) - Agent Card required fields
- Core message/send functionality
- Task management (get/cancel)
Example Failures:
test_task_history_length
β SDK doesn't implement historyLength parametertest_mandatory_fields_present
β Agent Card missing required fields
Purpose: Validate declared capabilities work correctly
Impact: Failure = False advertising
Logic: Skip if not declared, mandatory if declared
Location: tests/optional/capabilities/
Capability Validation:
{
"capabilities": {
"streaming": true, β Must pass streaming tests
"pushNotifications": false β Streaming tests will skip
}
}
Includes:
- Streaming support (
message/stream
,tasks/resubscribe
) - Push notification configuration
- File/data modality support
- Authentication methods
Purpose: Assess implementation robustness
Impact: Never blocks compliance, indicates production issues
Location: tests/optional/quality/
Quality Areas:
- Concurrent request handling
- Edge case robustness
- Unicode/special character support
- Boundary value handling
- Error recovery and resilience
Purpose: Measure optional feature completeness
Impact: Purely informational
Location: tests/optional/features/
Includes:
- Convenience features
- Enhanced error messages
- SDK-specific capabilities
- Optional protocol extensions
- Criteria: Any mandatory test failure
- Business Impact: Cannot be used for A2A integrations
- Action: Fix mandatory failures immediately
- Criteria: 100% mandatory test pass rate
- Business Impact: Basic A2A integration support
- Suitable For: Development and testing environments
- Next Step: Address capability validation
- Criteria: Mandatory (100%) + Capability (β₯85%) + Quality (β₯75%)
- Business Impact: Production-ready with confidence
- Suitable For: Staging and careful production deployment
- Next Step: Enhance feature completeness
- Criteria: Capability (β₯95%) + Quality (β₯90%) + Feature (β₯80%)
- Business Impact: Complete A2A implementation
- Suitable For: Full production deployment with confidence
When you run with --compliance-report
, you get a JSON report containing:
{
"summary": {
"compliance_level": "RECOMMENDED",
"overall_score": 87.5,
"mandatory_score": 100.0,
"capability_score": 90.0,
"quality_score": 75.0,
"feature_score": 60.0
},
"recommendations": [
"β
Ready for staging deployment",
"β οΈ Address 2 quality issues before production",
"π‘ Consider implementing 3 additional features"
],
"next_steps": [
"Fix Unicode handling in task storage",
"Improve concurrent request performance",
"Consider implementing authentication capability"
]
}
#!/bin/bash
# Block deployment if not A2A compliant
./run_tck.py --sut-url $SUT_URL --category mandatory
if [ $? -ne 0 ]; then
echo "β NOT A2A compliant - blocking deployment"
exit 1
fi
echo "β
A2A compliant - deployment approved"
#!/bin/bash
# Generate compliance report and make environment-specific decisions
./run_tck.py --sut-url $SUT_URL --category all --compliance-report compliance.json
COMPLIANCE_LEVEL=$(jq -r '.summary.compliance_level' compliance.json)
case $COMPLIANCE_LEVEL in
"NON_COMPLIANT")
echo "β Not A2A compliant - blocking all deployments"
exit 1
;;
"MANDATORY")
echo "π‘ Basic compliance - dev/test only"
[[ "$ENVIRONMENT" == "production" ]] && exit 1
;;
"RECOMMENDED")
echo "π’ Recommended - staging approved"
;;
"FULL_FEATURED")
echo "π Full compliance - production approved"
;;
esac
Streaming tests skipping:
# Check Agent Card capabilities
curl $SUT_URL/.well-known/agent.json | jq .capabilities
# If streaming: false, tests will skip (this is correct!)
Quality tests failing but compliance achieved:
# This is expected - quality tests don't block compliance
# Address quality issues for production readiness
Tests not discovering:
# Ensure proper installation
pip install -e .
# Check test discovery
pytest --collect-only tests/mandatory/
When debugging specific test failures, you can run individual tests with detailed output:
Run a single test with verbose output and debug information:
# Using run_tck.py with verbose mode (shows print() and logger.info() messages)
python run_tck.py --sut-url http://localhost:9999 --category capabilities --verbose-log
# Run specific test directly with pytest
python -m pytest tests/optional/capabilities/test_streaming_methods.py::test_message_stream_basic \
--sut-url http://localhost:9999 -s -v --log-cli-level=INFO
Run all tests in a specific file:
python -m pytest tests/optional/capabilities/test_streaming_methods.py \
--sut-url http://localhost:9999 -s -v --log-cli-level=INFO
Debug options explained:
-s
: Showsprint()
statements during test execution-v
: Verbose test output with detailed test names and outcomes--log-cli-level=INFO
: Showslogger.info()
and other log messages--tb=short
: Shorter traceback format (default in run_tck.py)
Run with different log levels:
# Show DEBUG level logs (very detailed)
python -m pytest tests/path/to/test.py --sut-url URL -s -v --log-cli-level=DEBUG
# Show only WARNING and ERROR logs
python -m pytest tests/path/to/test.py --sut-url URL -s -v --log-cli-level=WARNING
- SUT Requirements - Essential requirements for A2A implementations to work with the TCK
- SDK Validation Guide - Detailed usage guide for SDK developers
- Specification Update Workflow - Monitor and manage A2A specification changes
- Test Documentation Standards - Standards for test contributors
- Fork the repository
- Follow Test Documentation Standards
- Add tests with proper categorization and specification references
- Submit pull request with clear specification citations
This project is licensed under the MIT License - see the LICENSE file for details.
Just want A2A compliance?
./run_tck.py --sut-url URL --category mandatory
Planning production deployment?
./run_tck.py --sut-url URL --category all --compliance-report report.json
Debugging capability issues?
./run_tck.py --sut-url URL --category capabilities --verbose
Want comprehensive assessment?
./run_tck.py --sut-url URL --explain # Learn about categories first
./run_tck.py --sut-url URL --category all --compliance-report full_report.json
The A2A TCK transforms specification compliance from confusion into clarity. π