feat: expose AMI cache TTL as runtime flag by chrisdoherty4 · Pull Request #9052 · aws/karpenter-provider-aws

chrisdoherty4 · 2026-04-03T02:52:19Z

Fixes #N/A

Description

Operators running large fleets (15,000 nodes across 50+ clusters) with 10s of node classes can generate significant DescribeImages API call volume because the reconciler requeues periodically (order of 30s-1m) and uses a hardcoded 1-minute cache TTL. This change makes the cache TTL independently configurable so users can decide an appropriate AMI cache time for their usecase:

Flag	Env var	Default
`--ami-cache-ttl`	`AMI_CACHE_TTL`	`1m`

Default preserve existing behavior.

How was this change tested?

Unit tests added to pkg/operator/options/suite_test.go covering CLI
flag override and env var fallback, and validation rejection of non-positive values.
All existing unit tests pass.

Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: #
No

chrisdoherty4 · 2026-04-03T16:40:45Z

Looking deeper it seems a handful of reconcilers set a shorter TTL than the minimum requeue time for the AMI reconciler making the --ami-requeue-interval rather useless.

The cache TTL configurability does help reduce the API calls so that still feels like a worth while configuration option - longer cache windows are acceptable in our case.

chrisdoherty4 · 2026-04-03T19:54:06Z

Modified the PR to only expose AMI cache TTL. Being able to tweak this for our use case greatly improves API calls and avoids hitting rate limits.

Operators running large fleets can generate significant DescribeImages API call volume due to frequent AMI reconciles. This change makes the AMI cache TTL configurable so operators can tune them for their workload without rebuilding. --ami-cache-ttl (env: AMI_CACHE_TTL, default: 1m) Default preserve existing behaviour.

DerekFrank · 2026-04-07T18:01:17Z

We generally avoid surfacing too much config if we can avoid it. What were you going to set this to? We might just up the default, 1m seems a bit low for a default

chrisdoherty4 · 2026-04-07T18:25:51Z

We generally avoid surfacing too much config if we can avoid it. What were you going to set this to? We might just up the default, 1m seems a bit low for a default

Either 15m or 1h. We haven't decided and the flexibility is what would let us tweak things. I'm curious what problems there are with surfacing the configuration assuming its a sane default and well documented?

chrisdoherty4 · 2026-04-08T16:03:22Z

~~When I run this patch I found the DescribeSubnets and DescribeSecurityGroups go up by 7x and 3.5x respectively. That likely isn't acceptable to us either. I'm still trying to determine why.~~

Turns out this seems to be a regression somewhere between 1.8.1 and 1.10. The jumps here are when I deployed v1.10.

Opened #9063

jmdeal · 2026-04-09T20:06:26Z

I wanted to float an alternative approach to solving this issue that we've discussed internally. We're hesitant to expose cache TTL configurations directly for a couple of reasons:

Karpenter's caching logic is an internal implementation detail and the exact way it works is subject to change version to version. Knowing what to tweak a value to requires an understanding of Karpenter's internal caching logic.
Some cache TTLs are dependent on one another, tweaking one without understanding it's relation to others could cause subtle issues.

An alternative we could consider is surfacing per-API client side rate-limit buckets as a configuration. I believe this more directly addresses the core issue - limiting the impact of individual Karpenter controllers - while also not exposing internal implementation details. All internal reconcilers need to be tolerant to rate limiting whether it's from the client or from the server.

chrisdoherty4 · 2026-04-10T15:06:00Z

I wanted to float an alternative approach to solving this issue that we've discussed internally. We're hesitant to expose cache TTL configurations directly for a couple of reasons:

Karpenter's caching logic is an internal implementation detail and the exact way it works is subject to change version to version. Knowing what to tweak a value to requires an understanding of Karpenter's internal caching logic.

Some cache TTLs are dependent on one another, tweaking one without understanding it's relation to others could cause subtle issues.

An alternative we could consider is surfacing per-API client side rate-limit buckets as a configuration. I believe this more directly addresses the core issue - limiting the impact of individual Karpenter controllers - while also not exposing internal implementation details. All internal reconcilers need to be tolerant to rate limiting whether it's from the client or from the server.

Hi @jmdeal. Expressing this as client side rate limiting would work for us, thanks.

ryan-mist · 2026-04-13T19:54:03Z

Hi @chrisdoherty4,

Just checking back in on this - is this something you'd be interested in working on? If not then we can also work on this on our side. Thanks!

chrisdoherty4 · 2026-04-14T13:36:07Z

@ryan-mist I'm not planning to implement anything.

chrisdoherty4 requested a review from a team as a code owner April 3, 2026 02:52

chrisdoherty4 requested a review from ryan-mist April 3, 2026 02:52

chrisdoherty4 marked this pull request as draft April 3, 2026 14:17

chrisdoherty4 marked this pull request as ready for review April 3, 2026 16:43

chrisdoherty4 force-pushed the cpd-ami-cache-requeue-01 branch from a25243a to d3b7986 Compare April 3, 2026 19:51

chrisdoherty4 changed the title ~~feat: expose AMI cache TTL and requeue interval as runtime flags~~ feat: expose AMI cache TTL as runtime flags Apr 7, 2026

chrisdoherty4 changed the title ~~feat: expose AMI cache TTL as runtime flags~~ feat: expose AMI cache TTL as runtime flag Apr 7, 2026

chrisdoherty4 force-pushed the cpd-ami-cache-requeue-01 branch from d3b7986 to a1f37c7 Compare April 7, 2026 16:27

chrisdoherty4 closed this Apr 10, 2026

chrisdoherty4 mentioned this pull request Apr 14, 2026

Increase in AWS API call frequency between v1.8 and v1.9 #9063

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose AMI cache TTL as runtime flag#9052

feat: expose AMI cache TTL as runtime flag#9052
chrisdoherty4 wants to merge 1 commit intoaws:mainfrom
chrisdoherty4:cpd-ami-cache-requeue-01

chrisdoherty4 commented Apr 3, 2026 •

edited

Loading

Uh oh!

chrisdoherty4 commented Apr 3, 2026 •

edited

Loading

Uh oh!

chrisdoherty4 commented Apr 3, 2026 •

edited

Loading

Uh oh!

DerekFrank commented Apr 7, 2026 •

edited

Loading

Uh oh!

chrisdoherty4 commented Apr 7, 2026

Uh oh!

chrisdoherty4 commented Apr 8, 2026 •

edited

Loading

Uh oh!

jmdeal commented Apr 9, 2026

Uh oh!

chrisdoherty4 commented Apr 10, 2026

Uh oh!

ryan-mist commented Apr 13, 2026

Uh oh!

chrisdoherty4 commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

chrisdoherty4 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisdoherty4 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisdoherty4 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DerekFrank commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisdoherty4 commented Apr 7, 2026

Uh oh!

chrisdoherty4 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmdeal commented Apr 9, 2026

Uh oh!

chrisdoherty4 commented Apr 10, 2026

Uh oh!

ryan-mist commented Apr 13, 2026

Uh oh!

chrisdoherty4 commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chrisdoherty4 commented Apr 3, 2026 •

edited

Loading

chrisdoherty4 commented Apr 3, 2026 •

edited

Loading

chrisdoherty4 commented Apr 3, 2026 •

edited

Loading

DerekFrank commented Apr 7, 2026 •

edited

Loading

chrisdoherty4 commented Apr 8, 2026 •

edited

Loading