Skip to content

Conversation

@sparrc
Copy link
Contributor

@sparrc sparrc commented Jun 19, 2025

Summary

This fixes agent's ClusterNotFound error handling, which is returned as a ClientException in the case that the agent calls RCI and the cluster does not exist.

Testing

A new unit test file was added for testing the entire errors package, including this function.

Manual test was run to confirm that an account/region with no default cluster now has one auto-created and agent is able to register with the default cluster:

level=error time=2025-06-19T23:46:15Z msg="Unable to register as a container instance with ECS" error="operation error ECS: RegisterContainerInstance, https response error StatusCode: 400, RequestID: NNN, ClientException: Cluster not found."
level=error time=2025-06-19T23:46:15Z msg="Received terminal error from RegisterContainerInstance call, exiting" error="operation error ECS: RegisterContainerInstance, https response error StatusCode: 400, RequestID: NNN, ClientException: Cluster not found."
level=info time=2025-06-19T23:46:15Z msg="Successfully created a cluster" cluster="default"

New tests cover the changes: yes

Description for the changelog

bugfix: Fix "default" ecs cluster auto-create logic

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sparrc sparrc requested a review from a team as a code owner June 19, 2025 23:32
return false
}

func IsClusterNotFoundError(err error) bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously part of this function in aws sdk v1, see previous commit

func TestRegisterContainerInstanceWithNegativeResource(t *testing.T) {
ctrl := gomock.NewController(t)
defer ctrl.Finish()

Copy link
Contributor Author

@sparrc sparrc Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but this unit test fails on instances with more than 64GB of memory, because the mem value returned from getHostMemoryInMiB overflows uint16 used on line 789 below.

@sparrc sparrc changed the title Fix agent ClusterNotFound error handling bugfix: Fix "default" ecs cluster auto-create logic Jun 19, 2025
@sparrc sparrc merged commit 83a96ab into aws:dev Jun 24, 2025
40 checks passed
@prateekchaudhry prateekchaudhry mentioned this pull request Jul 3, 2025
timj-hh pushed a commit to timj-hh/amazon-ecs-agent that referenced this pull request Jul 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants