feat(bedrock-agentcore-alpha): add OnlineEvaluationConfig and Evaluator L2 constructs by rezabekf · Pull Request #37615 · aws/aws-cdk

rezabekf · 2026-04-16T10:08:10Z

Issue # (if applicable)

Reason for this change

Amazon Bedrock AgentCore Online Evaluation enables continuous monitoring and assessment of agent performance using live traffic. This PR adds L2 constructs for the evaluation module to the @aws-cdk/aws-bedrock-agentcore-alpha package.

CDK users can now:

Configure continuous evaluation of agent traces using built-in and custom evaluators
Control evaluation execution status (ENABLED/DISABLED) via executionStatus prop
Sample and filter traces for targeted evaluation
Integrate seamlessly with AgentCore Runtime constructs via DataSourceConfig.fromAgentRuntimeEndpoint()

Description of changes

OnlineEvaluationConfig — L2 construct backed by CfnOnlineEvaluationConfig

Auto-creates IAM execution role with required permissions (CloudWatch Logs read/write, Bedrock model invocation, index policies)
Supports executionStatus prop (ExecutionStatus.ENABLED / ExecutionStatus.DISABLED) to control whether evaluation actively processes traces
Accepts a mix of built-in and custom evaluators via EvaluatorReference
Provides fromOnlineEvaluationConfigId(), fromOnlineEvaluationConfigArn(), and fromOnlineEvaluationConfigAttributes() import methods
Implements IGrantable for IAM permission grants and ITaggableV2 for CDK tag propagation
Input validation for config name, description, evaluators count, sampling percentage, filters count, and session timeout

EvaluatorReference — Unified entry point for referencing evaluators

EvaluatorReference.builtin() — References one of the 13 pre-defined evaluators (e.g., HELPFULNESS, CORRECTNESS)
EvaluatorReference.custom() — References a user-created Evaluator construct

Evaluator — L2 construct backed by CfnEvaluator for custom evaluation logic

EvaluatorConfig.llmAsAJudge() — Foundation model-based evaluation with custom instructions and rating scales (categorical or numerical)
EvaluatorConfig.codeBased() — Lambda function-based evaluation; automatically grants scoped lambda:InvokeFunction permission with aws:SourceAccount and
aws:SourceArn conditions (confused deputy prevention)
Provides fromEvaluatorId(), fromEvaluatorArn(), and fromEvaluatorAttributes() import methods
Input validation for evaluator name, description, rating scale options, and instructions

EvaluatorRatingScale — Factory class for custom evaluator rating scales

EvaluatorRatingScale.categorical() — Discrete label-based scoring (e.g., Good/Bad)
EvaluatorRatingScale.numerical() — Labeled numeric scoring (e.g., 1-5)

DataSourceConfig — Configuration for evaluation data sources

fromCloudWatchLogs() — For external agents or custom log groups
fromAgentRuntimeEndpoint() — Seamless integration with AgentCore Runtime (derives log group and service names automatically)

Design decisions:

Factory classes (EvaluatorConfig, EvaluatorRatingScale, DataSourceConfig) used instead of union types for jsii compatibility
modelId accepted as plain string — supports standard model IDs and cross-region inference profile IDs (e.g., us.anthropic.claude-sonnet-4-6)
Instructions placeholder validation delegated to the service — placeholders vary by evaluation level and may change
Follows existing agentcore patterns (Runtime, Memory, Gateway) with interface + base class + concrete class

Describe any new or updated permissions being added

The auto-created OnlineEvaluationConfig execution role includes:

CloudWatch Logs Describe (logs:DescribeLogGroups) — unscoped (*), as this action does not support resource-level restrictions
CloudWatch Logs Query (logs:GetQueryResults, logs:StartQuery) — scoped to user-specified log groups and the aws/spans log group
CloudWatch Logs Write (logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents) — scoped to
arn:aws:logs:*:*:log-group:/aws/bedrock-agentcore/evaluations/*
CloudWatch Index Policy (logs:DescribeIndexPolicies, logs:PutIndexPolicy) — scoped to aws/spans log group
Bedrock Model Invocation (bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream) — for LLM-as-a-Judge evaluators

Code-based Evaluator construct:

Lambda Invoke (lambda:InvokeFunction) — granted to bedrock-agentcore.amazonaws.com service principal, scoped with aws:SourceAccount and
aws:SourceArn conditions to the specific evaluator resource

Description of how you validated changes

Unit Tests (online-evaluation.test.ts + custom-evaluator.test.ts) — 69 evaluation tests covering:

Creation with minimal and full props for both constructs
Built-in evaluators: all 13 evaluators, custom sampling and filter configurations
Custom evaluators: LLM-as-a-Judge (categorical/numerical scales, inference config), code-based (Lambda with timeout, scoped invoke permission)
EvaluatorReference.builtin() and EvaluatorReference.custom() produce correct evaluator references
Mixed evaluator usage in OnlineEvaluationConfig
executionStatus prop (ENABLED, DISABLED, omitted)
Input validation, token passthrough, grant and import methods for both constructs

Integration Test (integ.online-evaluation.ts)

Deploys OnlineEvaluationConfig with HELPFULNESS and CORRECTNESS built-in evaluators alongside a custom LLM-as-a-Judge evaluator
executionStatus: ENABLED
Deploy verified via integ-runner --update-on-failed

Rosetta — yarn rosetta:extract --strict passes (README and @example docstring snippets compile)

Checklist

My code adheres to the CONTRIBUTING GUIDE and DESIGN GUIDELINES

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

…tinuous agent evaluation - Implements OnlineEvaluationConfig L2 construct using AwsCustomResource - Supports 13 built-in evaluators and custom evaluator references - Supports CloudWatch Logs and Agent Endpoint data sources - Auto-creates IAM execution role with required permissions - Includes sampling, filtering, and session configuration - Provides grant methods and CloudWatch metrics - Comprehensive unit and integration tests with 93% coverage

…ineEvaluationConfig

…valuation

…Runtime evaluation

# Conflicts: # packages/@aws-cdk/aws-bedrock-agentcore-alpha/README.md

…al-construct

…mResource to L1 CfnOnlineEvaluationConfig

…README, align base class signature - Remove enableOnCreate from README properties table (prop doesn't exist in code) - Remove unused _getLogGroupNames() method from DataSourceConfig - Remove unused validateLogGroupNames() and its constants from validation-helpers - Accept ResourceProps in OnlineEvaluationBase constructor for consistency with other base classes

…eConfig and formatFilterValue

…up and service name validation - Remove dead READ_PERMS constant from EvaluationPerms - Add validateLogGroupNames (1-5) and validateServiceNames (>=1) - Wire validation into DataSourceConfig.fromCloudWatchLogs() - Add unit tests for data source validation

…aky -0 property test - Add tests for token values skipping validation (configName, description, samplingPercentage, sessionTimeout) - Add test for empty config name validation - Fix flaky Property 6 by excluding -0 from number arbitrary (CFN normalizes -0 to 0)

…e type name prefix per awslint:attribute-name

…online-evaluation-base

…I visibility in evaluation constructs Replace internal _render() methods with public bind() on EvaluatorReference and DataSourceConfig so they are accessible from all JSII target languages. Add proper return type interfaces (EvaluatorReferenceBindResult, DataSourceConfigBindResult) since JSII requires named types. Also fix pre-existing ValidationError constructor signatures and attrExecutionStatus L1 API change from main merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…log groups in evaluation construct Replace wildcard resource ('*') on CloudWatch Logs read permissions with scoped log group ARNs derived from the data source configuration. Uses Arn.format() with partition/region/account pseudo parameters for proper cross-partition support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…s from evaluation construct Property-based tests with fast-check are not standard practice in CDK constructs. The validation logic is already covered by unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…f CDK tagging aspect Remove the tags property from OnlineEvaluationBaseProps and the manual Record<string,string> to CfnTag[] conversion. The L1 CfnOnlineEvaluationConfig implements ITaggableV2 with cdkTagManager, so Tags.of() works automatically. This follows CDK best practices and avoids potential tag duplication. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…uationConfig The `ExecutionStatus` property on `AWS::BedrockAgentCore::OnlineEvaluationConfig` is a writable input (ENABLED/DISABLED) but was previously only read back as an output without being passed to the L1 constructor. This adds it as an optional input prop with a proper `ExecutionStatus` enum so users can control whether an evaluation is enabled or disabled at creation time. Also fixes CWL read permissions that were too aggressively scoped — splits `DescribeLogGroups` (which doesn't support resource-level restrictions) to `*` while keeping `StartQuery`/`GetQueryResults` scoped to specific log groups. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…uation construct - Replace EvaluationPerms namespace with flat exports (jsii-compatible) - Remove ADMIN_PERMS and grantAdmin() as they are control plane operations - Extend IOnlineEvaluationConfigRef from L1 and add onlineEvaluationConfigRef getter - Remove interface-extends-ref lint exclusion from package.json Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…online evaluation Add the `Evaluator` L2 construct wrapping `AWS::BedrockAgentCore::Evaluator`, supporting LLM-as-a-Judge and code-based (Lambda) evaluation strategies. Custom evaluators integrate with `OnlineEvaluationConfig` via `EvaluatorReference.custom()`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-16T10:10:46Z

⚠️ Experimental Feature: This security report is currently in experimental phase. Results may include false positives and the rules are being actively refined.
This security report is NOT a review blocker. Please try merge from main to avoid findings unrelated to the PR.
To suppress a specific rule, see Suppressing Rules.

	Tests	Passed ✅	Skipped	Failed
Security Guardian Results	48 ran	48 passed

Test	Result
No test annotations available

View Security Guardian Results

github-actions · 2026-04-16T10:10:49Z

⚠️ Experimental Feature: This security report is currently in experimental phase. Results may include false positives and the rules are being actively refined.
This security report is NOT a review blocker. Please try merge from main to avoid findings unrelated to the PR.
To suppress a specific rule, see Suppressing Rules.

	Tests	Passed ✅	Skipped	Failed
Security Guardian Results with resolved templates	48 ran	48 passed

Test	Result
No test annotations available

View Security Guardian Results with resolved templates

rezabekf and others added 29 commits January 15, 2026 01:06

refactor(bedrock-agentcore): remove custom evaluator support from Onl…

feb1300

…ineEvaluationConfig

refactor(bedrock-agentcore): rename OnlineEvaluationConfig to OnlineE…

9140c4f

…valuation

feat(aws-bedrock-agentcore-alpha): simplify DataSourceConfig API for …

e9a6087

…Runtime evaluation

refactor: code and test cleanup

2d37b5d

Merge branch 'main' into rezabekf/agentcore-eval-construct

131162e

fix: issues after rebase

8c03710

Merge branch 'main' into rezabekf/agentcore-eval-construct

ae9b775

# Conflicts: # packages/@aws-cdk/aws-bedrock-agentcore-alpha/README.md

chore(deps): sync with main

eff1e1a

Merge remote-tracking branch 'origin/main' into rezabekf/agentcore-ev…

c80b25c

…al-construct

chore(bedrock-agentcore): fix consistent-type-imports lint errors

d829dda

feat(bedrock-agentcore): migrate OnlineEvaluationConfig from AwsCusto…

7197dca

…mResource to L1 CfnOnlineEvaluationConfig

feat: add additional return outputs

66fe749

chore(bedrock-agentcore-alpha): replace any with L1 types in buildRul…

ec82659

…eConfig and formatFilterValue

chore(bedrock-agentcore-alpha): regenerate integ test snapshot

1d98fc6

refactor(bedrock-agentcore-alpha): rename evaluation attributes to us…

40d1a56

…e type name prefix per awslint:attribute-name

fix(bedrock-agentcore-alpha): merge duplicate aws-cdk-lib imports in …

4a5e1c1

…online-evaluation-base

Merge branch 'main' into rezabekf/agentcore-eval-construct

407d3a4

Merge branch 'main' into rezabekf/agentcore-eval-construct

d46b25e

rezabekf temporarily deployed to automation April 16, 2026 10:08 — with GitHub Actions Inactive

github-actions bot added feature-request A feature should be added or improved. p2 beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK labels Apr 16, 2026

aws-cdk-automation requested a review from a team April 16, 2026 10:08

rezabekf temporarily deployed to automation April 16, 2026 10:08 — with GitHub Actions Inactive

aws-cdk-automation added the pr/needs-further-review PR requires additional review from our team specialists due to the scope or complexity of changes. label Apr 16, 2026

aws-cdk-automation temporarily deployed to automation April 16, 2026 10:39 — with GitHub Actions Inactive

aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Apr 16, 2026

aws-cdk-automation temporarily deployed to automation April 16, 2026 10:55 — with GitHub Actions Inactive

pahud mentioned this pull request Apr 17, 2026

(bedrock-agentcore-alpha): add OnlineEvaluationConfig and Evaluator L2 constructs for online evaluation #37614

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bedrock-agentcore-alpha): add OnlineEvaluationConfig and Evaluator L2 constructs#37615

feat(bedrock-agentcore-alpha): add OnlineEvaluationConfig and Evaluator L2 constructs#37615
rezabekf wants to merge 29 commits intoaws:mainfrom
rezabekf:rezabekf/agentcore-eval-construct

rezabekf commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rezabekf commented Apr 16, 2026

Issue # (if applicable)

Reason for this change

Description of changes

Describe any new or updated permissions being added

Description of how you validated changes

Checklist

Uh oh!

github-actions bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Apr 16, 2026 •

edited

Loading

github-actions bot commented Apr 16, 2026 •

edited

Loading