feat(bedrock-agentcore-alpha): add OnlineEvaluationConfig and Evaluator L2 constructs#37615
Open
feat(bedrock-agentcore-alpha): add OnlineEvaluationConfig and Evaluator L2 constructs#37615
Conversation
…tinuous agent evaluation - Implements OnlineEvaluationConfig L2 construct using AwsCustomResource - Supports 13 built-in evaluators and custom evaluator references - Supports CloudWatch Logs and Agent Endpoint data sources - Auto-creates IAM execution role with required permissions - Includes sampling, filtering, and session configuration - Provides grant methods and CloudWatch metrics - Comprehensive unit and integration tests with 93% coverage
…ineEvaluationConfig
…Runtime evaluation
# Conflicts: # packages/@aws-cdk/aws-bedrock-agentcore-alpha/README.md
…mResource to L1 CfnOnlineEvaluationConfig
…README, align base class signature - Remove enableOnCreate from README properties table (prop doesn't exist in code) - Remove unused _getLogGroupNames() method from DataSourceConfig - Remove unused validateLogGroupNames() and its constants from validation-helpers - Accept ResourceProps in OnlineEvaluationBase constructor for consistency with other base classes
…eConfig and formatFilterValue
…up and service name validation - Remove dead READ_PERMS constant from EvaluationPerms - Add validateLogGroupNames (1-5) and validateServiceNames (>=1) - Wire validation into DataSourceConfig.fromCloudWatchLogs() - Add unit tests for data source validation
…aky -0 property test - Add tests for token values skipping validation (configName, description, samplingPercentage, sessionTimeout) - Add test for empty config name validation - Fix flaky Property 6 by excluding -0 from number arbitrary (CFN normalizes -0 to 0)
…e type name prefix per awslint:attribute-name
…online-evaluation-base
…I visibility in evaluation constructs Replace internal _render() methods with public bind() on EvaluatorReference and DataSourceConfig so they are accessible from all JSII target languages. Add proper return type interfaces (EvaluatorReferenceBindResult, DataSourceConfigBindResult) since JSII requires named types. Also fix pre-existing ValidationError constructor signatures and attrExecutionStatus L1 API change from main merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…log groups in evaluation construct
Replace wildcard resource ('*') on CloudWatch Logs read permissions with
scoped log group ARNs derived from the data source configuration. Uses
Arn.format() with partition/region/account pseudo parameters for
proper cross-partition support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s from evaluation construct Property-based tests with fast-check are not standard practice in CDK constructs. The validation logic is already covered by unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…f CDK tagging aspect Remove the tags property from OnlineEvaluationBaseProps and the manual Record<string,string> to CfnTag[] conversion. The L1 CfnOnlineEvaluationConfig implements ITaggableV2 with cdkTagManager, so Tags.of() works automatically. This follows CDK best practices and avoids potential tag duplication. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…uationConfig The `ExecutionStatus` property on `AWS::BedrockAgentCore::OnlineEvaluationConfig` is a writable input (ENABLED/DISABLED) but was previously only read back as an output without being passed to the L1 constructor. This adds it as an optional input prop with a proper `ExecutionStatus` enum so users can control whether an evaluation is enabled or disabled at creation time. Also fixes CWL read permissions that were too aggressively scoped — splits `DescribeLogGroups` (which doesn't support resource-level restrictions) to `*` while keeping `StartQuery`/`GetQueryResults` scoped to specific log groups. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…uation construct - Replace EvaluationPerms namespace with flat exports (jsii-compatible) - Remove ADMIN_PERMS and grantAdmin() as they are control plane operations - Extend IOnlineEvaluationConfigRef from L1 and add onlineEvaluationConfigRef getter - Remove interface-extends-ref lint exclusion from package.json Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…online evaluation Add the `Evaluator` L2 construct wrapping `AWS::BedrockAgentCore::Evaluator`, supporting LLM-as-a-Judge and code-based (Lambda) evaluation strategies. Custom evaluators integrate with `OnlineEvaluationConfig` via `EvaluatorReference.custom()`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
|
||||||||||||||
Contributor
|
|
||||||||||||||
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue # (if applicable)
Closes #37614.
Reason for this change
Amazon Bedrock AgentCore Online Evaluation enables continuous monitoring and assessment of agent performance using live traffic. This PR adds L2 constructs for the evaluation module to the
@aws-cdk/aws-bedrock-agentcore-alphapackage.CDK users can now:
executionStatuspropDataSourceConfig.fromAgentRuntimeEndpoint()Description of changes
OnlineEvaluationConfig— L2 construct backed byCfnOnlineEvaluationConfigexecutionStatusprop (ExecutionStatus.ENABLED/ExecutionStatus.DISABLED) to control whether evaluation actively processes tracesEvaluatorReferencefromOnlineEvaluationConfigId(),fromOnlineEvaluationConfigArn(), andfromOnlineEvaluationConfigAttributes()import methodsIGrantablefor IAM permission grants andITaggableV2for CDK tag propagationEvaluatorReference— Unified entry point for referencing evaluatorsEvaluatorReference.builtin()— References one of the 13 pre-defined evaluators (e.g., HELPFULNESS, CORRECTNESS)EvaluatorReference.custom()— References a user-createdEvaluatorconstructEvaluator— L2 construct backed byCfnEvaluatorfor custom evaluation logicEvaluatorConfig.llmAsAJudge()— Foundation model-based evaluation with custom instructions and rating scales (categorical or numerical)EvaluatorConfig.codeBased()— Lambda function-based evaluation; automatically grants scopedlambda:InvokeFunctionpermission withaws:SourceAccountandaws:SourceArnconditions (confused deputy prevention)fromEvaluatorId(),fromEvaluatorArn(), andfromEvaluatorAttributes()import methodsEvaluatorRatingScale— Factory class for custom evaluator rating scalesEvaluatorRatingScale.categorical()— Discrete label-based scoring (e.g., Good/Bad)EvaluatorRatingScale.numerical()— Labeled numeric scoring (e.g., 1-5)DataSourceConfig— Configuration for evaluation data sourcesfromCloudWatchLogs()— For external agents or custom log groupsfromAgentRuntimeEndpoint()— Seamless integration with AgentCore Runtime (derives log group and service names automatically)Design decisions:
EvaluatorConfig,EvaluatorRatingScale,DataSourceConfig) used instead of union types for jsii compatibilitymodelIdaccepted as plain string — supports standard model IDs and cross-region inference profile IDs (e.g.,us.anthropic.claude-sonnet-4-6)Describe any new or updated permissions being added
The auto-created
OnlineEvaluationConfigexecution role includes:logs:DescribeLogGroups) — unscoped (*), as this action does not support resource-level restrictionslogs:GetQueryResults,logs:StartQuery) — scoped to user-specified log groups and theaws/spanslog grouplogs:CreateLogGroup,logs:CreateLogStream,logs:PutLogEvents) — scoped toarn:aws:logs:*:*:log-group:/aws/bedrock-agentcore/evaluations/*logs:DescribeIndexPolicies,logs:PutIndexPolicy) — scoped toaws/spanslog groupbedrock:InvokeModel,bedrock:InvokeModelWithResponseStream) — for LLM-as-a-Judge evaluatorsCode-based
Evaluatorconstruct:lambda:InvokeFunction) — granted tobedrock-agentcore.amazonaws.comservice principal, scoped withaws:SourceAccountandaws:SourceArnconditions to the specific evaluator resourceDescription of how you validated changes
Unit Tests (
online-evaluation.test.ts+custom-evaluator.test.ts) — 69 evaluation tests covering:EvaluatorReference.builtin()andEvaluatorReference.custom()produce correct evaluator referencesOnlineEvaluationConfigexecutionStatusprop (ENABLED, DISABLED, omitted)Integration Test (
integ.online-evaluation.ts)OnlineEvaluationConfigwith HELPFULNESS and CORRECTNESS built-in evaluators alongside a custom LLM-as-a-Judge evaluatorexecutionStatus: ENABLEDinteg-runner --update-on-failedRosetta —
yarn rosetta:extract --strictpasses (README and@exampledocstring snippets compile)Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license