Skip to content

Commit 515c142

Browse files
committed
Add beta requirements
1 parent 475b18a commit 515c142

File tree

3 files changed

+63
-7
lines changed

3 files changed

+63
-7
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 3352
22
alpha:
33
approver: "@deads2k"
4+
beta:
5+
approver: "@deads2k"

keps/sig-api-machinery/3352-aggregated-discovery/README.md

Lines changed: 55 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -616,6 +616,7 @@ main focus will be on kubectl and golang clients.
616616
#### Beta
617617

618618
- kubectl uses the aggregated discovery feature by default
619+
- Metrics are added
619620

620621
#### GA
621622

@@ -678,7 +679,7 @@ channel if you need any help or guidance. -->
678679

679680
###### Does enabling the feature change any default behavior?
680681

681-
No
682+
Clients using client-go will use the aggregated discovery endpoint rather than the unaggregated discovery endpoint. This is handled automatically in client-go and clients should see less requests to the api server when fetching discovery information.
682683

683684
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
684685

@@ -731,6 +732,12 @@ feature flags will be enabled on some API servers and not others
731732
during the rollout. Similarly, consider large clusters and how
732733
enablement/disablement will rollout across nodes. -->
733734

735+
During a rollout, some apiservers may support aggregated discovery and
736+
some may not. It is recommended that clients request for both the
737+
aggregated discovery document with a fallback to the unaggregated
738+
discovery format. This can be achieved by setting the Accept header to
739+
have a fallback to the default GVK of the `/apis` and `/api` endpoint.
740+
734741
###### What specific metrics should inform a rollback?
735742

736743
<!-- What signals should users be paying attention to when the feature
@@ -748,11 +755,20 @@ term, we may want to require automated upgrade/rollback tests, but we
748755
are missing a bunch of machinery and tooling and can't do that now.
749756
-->
750757

758+
n/a.
759+
751760
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
752761

753762
<!-- Even if applying deprecation policies, they may still surprise
754763
some users. -->
755764

765+
By enabling aggregated discovery as the default, the new API is
766+
slightly different from the unaggregated version. The
767+
StorageVersionHash field is removed from resources in the aggregated
768+
discovery API. The storage version migrator will have an additional
769+
flag when initializing the discovery client to continue using the
770+
unaggregated API.
771+
756772
### Monitoring Requirements
757773

758774
<!-- This section must be completed when targeting beta to a release.
@@ -766,6 +782,12 @@ the previous answers based on experience in the field. -->
766782
Kubernetes API (e.g., checking if there are objects with field X set)
767783
may be a last resort. Avoid logs or events for this purpose. -->
768784

785+
Operators can check whether an aggregated discovery request can be
786+
made by sending a request to `apis` with
787+
`application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json`
788+
as the Accept header and looking at the return type. They can also
789+
check for the presence of aggregated discovery related metrics: `aggregated_discovery_aggregation_count`
790+
769791
###### How can someone using this feature know that it is working for their instance?
770792

771793
<!-- For instance, if this is a pod-related feature, it should be
@@ -802,14 +824,18 @@ the next question. -->
802824
- [x] Metrics
803825
- Metric name: `aggregator_discovery_aggregation_duration`
804826
- Components exposing the metric: `kube-server`
805-
- This is a metric for exposing the time it took to aggregate all the
827+
- This is a metric for exposing the time it took to aggregate all the api resources.
828+
829+
- Metric name: `aggregator_discovery_aggregation_count`
830+
- Components exposing the metric: `kube-server`
831+
- This is a metric for the number of times that the discovery document has been aggregated.
806832

807833
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
808834

809835
<!-- Describe the metrics themselves and the reasons why they weren't
810836
added (e.g., cost, implementation difficulties, etc.). -->
811837

812-
Yes. A metric for the regeneration count of the discovery document. `aggregator_discovery_aggregation_count`
838+
No.
813839

814840
### Dependencies
815841

@@ -833,6 +859,12 @@ cluster-level services (e.g. DNS):
833859
- Impact of its degraded performance or high-error rates on the
834860
feature: -->
835861

862+
No, but if aggregated apiservers are present, the feature will attempt
863+
to contact and aggregate the data published from the aggregated
864+
apiserver on a set interval. If there is high error rate, stale data
865+
may be returned because the latest data was not able to be fetched
866+
from the aggregated apiserver.
867+
836868
### Scalability
837869

838870
<!-- For alpha, this section is encouraged: reviewers should consider
@@ -858,6 +890,12 @@ Focusing mostly on:
858890
- periodic API calls to reconcile state (e.g. periodic fetching
859891
state, heartbeats, leader election, etc.) -->
860892

893+
No. Enabling this feature should reduce the total number of API calls
894+
for client discovery. Instead of clients sending a discovery request
895+
to all group versions (`/apis/<group>/<version`), they will only need
896+
to send a request to the aggregated endpoint to obtain all resources
897+
that the cluster supports.
898+
861899
###### Will enabling / using this feature result in introducing new API types?
862900

863901
<!-- Describe them, providing:
@@ -866,12 +904,16 @@ Focusing mostly on:
866904
- Supported number of objects per namespace (for namespace-scoped
867905
objects) -->
868906

907+
Yes, but these API types are not persisted.
908+
869909
###### Will enabling / using this feature result in any new calls to the cloud provider?
870910

871911
<!-- Describe them, providing:
872912
- Which API(s):
873913
- Estimated increase: -->
874914

915+
No.
916+
875917
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
876918

877919
<!-- Describe them, providing:
@@ -880,6 +922,8 @@ objects) -->
880922
- Estimated amount of new objects: (e.g., new Object X for every
881923
existing Pod) -->
882924

925+
No.
926+
883927
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
884928

885929
<!-- Look at the [existing SLIs/SLOs].
@@ -890,6 +934,8 @@ details.
890934
891935
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos -->
892936

937+
No.
938+
893939
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
894940

895941
<!-- Things to keep in mind include: additional in-memory state,
@@ -900,6 +946,8 @@ large cases, again with respect to the [supported limits].
900946
901947
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md -->
902948

949+
No.
950+
903951
### Troubleshooting
904952

905953
<!-- This section must be completed when targeting beta to a release.
@@ -914,6 +962,8 @@ may consider splitting it into a dedicated `Playbook` document
914962

915963
###### How does this feature react if the API server and/or etcd is unavailable?
916964

965+
The feature is built into the API server, and will not work if the API server is unavailable.
966+
917967
###### What are other known failure modes?
918968

919969
<!-- For each of them, fill in the following information by copying
@@ -932,6 +982,8 @@ why. -->
932982

933983
###### What steps should be taken if SLOs are not being met to determine the problem?
934984

985+
The feature can be rolled back by setting the AggregatedDiscoveryEndpoint feature flag to false.
986+
935987
## Implementation History
936988

937989
<!-- Major milestones in the lifecycle of a KEP should be tracked in

keps/sig-api-machinery/3352-aggregated-discovery/kep.yaml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,17 @@ approvers:
1818
- "@lavalamp"
1919

2020
# The target maturity stage in the current dev cycle for this KEP.
21-
stage: alpha
21+
stage: beta
2222

2323
# The most recent milestone for which work toward delivery of this KEP has been
2424
# done. This can be the current (upcoming) milestone, if it is being actively
2525
# worked on.
26-
latest-milestone: "v1.26"
26+
latest-milestone: "v1.27"
2727

2828
# The milestone at which this feature was, or is targeted to be, at each stage.
2929
milestone:
3030
alpha: "v1.26"
31+
beta: "v1.27"
3132

3233
# The following PRR answers are required at alpha release
3334
# List the feature gate name and the components for which it must be enabled
@@ -38,5 +39,6 @@ feature-gates:
3839
disable-supported: true
3940

4041
# The following PRR answers are required at beta release
41-
# metrics:
42-
# - my_feature_metric
42+
metrics:
43+
- aggregator_discovery_aggregation_duration
44+
- aggregator_discovery_aggregation_count

0 commit comments

Comments
 (0)