Skip to content

Commit ea5e9d6

Browse files
committed
docs: add zonal shift RFC
1 parent cd5bb29 commit ea5e9d6

File tree

1 file changed

+98
-0
lines changed

1 file changed

+98
-0
lines changed

designs/zonal-shift.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Zonal Shift RFC
2+
3+
## Background
4+
5+
Occasionally, zones in cloud providers can experience temporary outages. These outages can be partial failures or complete outages of any number of dependencies that clusters require including networking, compute, authentication, and more. During these events, Karpenter's actions do not improve its cluster's availability posture and can sometimes exacerbate the scenario.
6+
7+
While detecting these outages is outside the scope of this RFC, Karpenter should provide the ability to integrate with solutions that do and modify its behavior to ensure that it does not exacerbate any zonal outages.
8+
9+
## Technical Requirements
10+
11+
1. Stop provisioning capacity in the **impaired** AZ
12+
2. Stop performing voluntary disruption in the **impaired** AZ.
13+
3. Stop performing voluntary disruption in the **unimpaired** AZs if the disruption relies on scheduling pods to the **impaired** AZ.
14+
4. Pods with strict scheduling requirements that require capacity in the impaired AZ such as volume requirements or node affinities **should not** result in launch attempts
15+
5. If an option is set, pods with TSCs that require capacity in the impaired AZ should instead have capacity launched into unimpaired AZs while still maintaining skew between the remaining unimpaired AZs.
16+
17+
# Recommended Option: Provider-only Implementation
18+
19+
Because the EKS Zonal Shift button already taints nodes in the impaired AZ, a Karpenter or Auto managed cluster that has been Zonally Shifted will already meet the technical requirement for `3`, because the nodes cannot have pods scheduled to them due to the aforementioned taint.
20+
21+
This option does not meet requirement `5`. kube-scheduler changes are necessary to meet requirement 5. See https://docs.google.com/document/d/1elP211dNvUXCtAn5alW4qGzGnY0s4K8_e4X6p640-5E/edit?tab=t.0#heading=h.8l5g85o4cda3
22+
23+
## Mechanism
24+
25+
To meet requirements `1` , `2`, and `4` during a zonal shift the aws and auto providers will set all of the offerings in the impaired zone to Unavailable.
26+
27+
```
28+
type Offering struct {
29+
Requirements scheduling.Requirements
30+
Price float64
31+
Available bool // set to false during a zonal event
32+
ReservationCapacity int
33+
priceOverlayApplied bool
34+
}
35+
```
36+
37+
## Observability
38+
39+
### Metrics
40+
41+
Karpenter will emit metrics that indicate which zones have been marked as impaired, and will log when the state of zonal behavior changes. It will not log each time it decides to not take an action to prevent spamming the log with entries during an event.
42+
43+
Karpenter will emit a new metric, `karpenter_cloudprovider_zonal_shift_duration` that will indicates how long a zonal shift has been in progress. This metric will be dimensioned with the zone in question and if the shift is manual or automatic so users are able to understand overlapping zonal shifts in multiple zones.
44+
45+
### Events
46+
47+
Karpenter could event against nodepools that allow instances in the impaired AZ to indicate that new nodes cannot be provisioned in a given AZ. This is not required for an initial release, but could be a nice follow up.
48+
49+
## Enablement
50+
51+
Zonal Shift Support will be disabled by default with an opt in flag for alpha release. Users who choose to configure this behavior will pass in an [environment variable or CLI flag](https://karpenter.sh/docs/reference/settings/) to the Karpenter binary that indicates if they wish to enable Zonal Shift.
52+
53+
The flag will be called `ENABLE_ZONAL_SHIFT` or `--enable-zonal-shift` , and will accept a boolean value.
54+
55+
The downside to this approach is that customers who wish to quickly disable this behavior during an event will need to restart their Karpenter process to do so.
56+
57+
The decision to not enable this feature at the NodePool or NodeClass level is purposeful. It simplifies Karpenter’s behavior considerably to have the modifications be uniform across the cluster. Unless there is a strong use case for failing away some nodepools but not others, this should be kept as a cluster level setting.
58+
59+
## Source of Truth for Zonal Shifts
60+
61+
Karpenter will need to detect when a ZonalShift is activated.
62+
63+
### Option 1: GetManagedResource Now, EventBridge Later (recommended)
64+
65+
Karpenter relies on GetManagedResource now to build a simple and operationally sound interface, then later we can perform the additional work to support EventBridge events as users experience TPS issues.
66+
67+
### Option 2: GetManagedResource only
68+
69+
A Zonal Shift Provider will be created. The provider will be responsible for tracking zonal shifts in ARC, and will be used by the Offerings Cache to determine offering availability.
70+
71+
The Zonal Shift Provider will regularly exercise the ARC [GetManagedResource API](https://docs.aws.amazon.com/arc-zonal-shift/latest/api/API_ListZonalShifts.html)with the resource arn of the EKS cluster and maintain an in-memory store of the state of zonal shifts, as well as an aggregated state of the list of impaired zones.
72+
73+
When a new Zonal Shift is returned from the API, the provider will verify that the ShiftType is correct and that the shift applies to any configured resource identifier. If the Zonal Shift passes validation, it will be added to the in memory store of the state of zonal shifts, and the aggregated state will be re-computed.
74+
75+
When a Zonal Shift expires as per its ExpiryTime, it will be evicted from the in memory store and the aggregated state will be re-computed using the in memory store.
76+
77+
When the provider’s GetInstanceTypes() function is exercised, the availability of offerings will be updated with zonal shift information.
78+
79+
#### Modifications to Permissions
80+
81+
Karpenter will need to be given permissions to GetManagedResource. Users will need to update their [ControllerRole](https://karpenter.sh/docs/reference/cloudformation/).
82+
83+
### Option 3: EventBridge\SQS only
84+
85+
Karpenter is already made aware of some EventBridge events via an SQS queue, notably spot interruption events. This queue could be supplemented to also consume Zonal Shift events. The SQS provider can be supplemented to update the Offering Cache.
86+
87+
https://docs.aws.amazon.com/eventbridge/latest/ref/events-ref-arc-zonal-shift.html
88+
https://docs.aws.amazon.com/r53recovery/latest/dg/eventbridge-zonal-autoshift.html
89+
90+
These events return the zone-id, which we can translate to zone using the subnet data the same way we do for offerings.
91+
92+
#### Benefits:
93+
94+
This allows Karpenter to call GetManagedResource less frequently
95+
96+
#### Drawbacks:
97+
98+
EventBridge events are best effort, which means that Karpenter may miss some events, or get some events late. EventBridge does not have any SLAs on event delivery time.

0 commit comments

Comments
 (0)