@@ -616,6 +616,7 @@ main focus will be on kubectl and golang clients.
616
616
#### Beta
617
617
618
618
- kubectl uses the aggregated discovery feature by default
619
+ - Metrics are added
619
620
620
621
#### GA
621
622
@@ -678,7 +679,12 @@ channel if you need any help or guidance. -->
678
679
679
680
###### Does enabling the feature change any default behavior?
680
681
681
- No
682
+ Clients using client-go version 1.25 and up will use the aggregated
683
+ discovery endpoint rather than the unaggregated discovery endpoint.
684
+ This is handled automatically in client-go and clients should see less
685
+ requests to the api server when fetching discovery information. Client
686
+ versions older than 1.25 will continue to use the old unaggregated
687
+ discovery endpoint without any changes.
682
688
683
689
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
684
690
@@ -690,7 +696,13 @@ and restarting the component. No other changes should be necessary to
690
696
disable the feature.
691
697
692
698
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
693
- --> Yes, the feature may be disabled by reverting the feature flag.
699
+ -->
700
+
701
+ Yes, the feature may be disabled on the apiserver by reverting the
702
+ feature flag. The feature may also be turned off client side by users
703
+ of client-go via the
704
+ [ WithLegacy()] ( https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/discovery/discovery_client.go#L80 )
705
+ toggle.
694
706
695
707
###### What happens if we reenable the feature if it was previously rolled back?
696
708
@@ -731,6 +743,14 @@ feature flags will be enabled on some API servers and not others
731
743
during the rollout. Similarly, consider large clusters and how
732
744
enablement/disablement will rollout across nodes. -->
733
745
746
+ During a rollout, some apiservers may support aggregated discovery and
747
+ some may not. It is recommended that clients request for both the
748
+ aggregated discovery document with a fallback to the unaggregated
749
+ discovery format. This can be achieved by setting the Accept header to
750
+ have a fallback to the default GVK of the ` /apis ` and ` /api ` endpoint.
751
+ This kind of fallback is already implemented in client-go and this
752
+ note is intended for non-golang clients.
753
+
734
754
###### What specific metrics should inform a rollback?
735
755
736
756
<!-- What signals should users be paying attention to when the feature
@@ -748,11 +768,20 @@ term, we may want to require automated upgrade/rollback tests, but we
748
768
are missing a bunch of machinery and tooling and can't do that now.
749
769
-->
750
770
771
+ n/a.
772
+
751
773
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
752
774
753
775
<!-- Even if applying deprecation policies, they may still surprise
754
776
some users. -->
755
777
778
+ By enabling aggregated discovery as the default, the new API is
779
+ slightly different from the unaggregated version. The
780
+ StorageVersionHash field is removed from resources in the aggregated
781
+ discovery API. The storage version migrator will have an additional
782
+ flag when initializing the discovery client to continue using the
783
+ unaggregated API.
784
+
756
785
### Monitoring Requirements
757
786
758
787
<!-- This section must be completed when targeting beta to a release.
@@ -766,6 +795,12 @@ the previous answers based on experience in the field. -->
766
795
Kubernetes API (e.g., checking if there are objects with field X set)
767
796
may be a last resort. Avoid logs or events for this purpose. -->
768
797
798
+ Operators can check whether an aggregated discovery request can be
799
+ made by sending a request to ` apis ` with
800
+ ` application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json `
801
+ as the Accept header and looking at the return type. They can also
802
+ check for the presence of aggregated discovery related metrics: ` aggregated_discovery_aggregation_count `
803
+
769
804
###### How can someone using this feature know that it is working for their instance?
770
805
771
806
<!-- For instance, if this is a pod-related feature, it should be
@@ -802,14 +837,18 @@ the next question. -->
802
837
- [x] Metrics
803
838
- Metric name: ` aggregator_discovery_aggregation_duration `
804
839
- Components exposing the metric: ` kube-server `
805
- - This is a metric for exposing the time it took to aggregate all the
840
+ - This is a metric for exposing the time it took to aggregate all the api resources.
841
+
842
+ - Metric name: ` aggregator_discovery_aggregation_count `
843
+ - Components exposing the metric: ` kube-server `
844
+ - This is a metric for the number of times that the discovery document has been aggregated.
806
845
807
846
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
808
847
809
848
<!-- Describe the metrics themselves and the reasons why they weren't
810
849
added (e.g., cost, implementation difficulties, etc.). -->
811
850
812
- Yes. A metric for the regeneration count of the discovery document. ` aggregator_discovery_aggregation_count `
851
+ No.
813
852
814
853
### Dependencies
815
854
@@ -833,6 +872,12 @@ cluster-level services (e.g. DNS):
833
872
- Impact of its degraded performance or high-error rates on the
834
873
feature: -->
835
874
875
+ No, but if aggregated apiservers are present, the feature will attempt
876
+ to contact and aggregate the data published from the aggregated
877
+ apiserver on a set interval. If there is high error rate, stale data
878
+ may be returned because the latest data was not able to be fetched
879
+ from the aggregated apiserver.
880
+
836
881
### Scalability
837
882
838
883
<!-- For alpha, this section is encouraged: reviewers should consider
@@ -858,6 +903,12 @@ Focusing mostly on:
858
903
- periodic API calls to reconcile state (e.g. periodic fetching
859
904
state, heartbeats, leader election, etc.) -->
860
905
906
+ No. Enabling this feature should reduce the total number of API calls
907
+ for client discovery. Instead of clients sending a discovery request
908
+ to all group versions (` /apis/<group>/<version> ` ), they will only need
909
+ to send a request to the aggregated endpoint to obtain all resources
910
+ that the cluster supports.
911
+
861
912
###### Will enabling / using this feature result in introducing new API types?
862
913
863
914
<!-- Describe them, providing:
@@ -866,12 +917,16 @@ Focusing mostly on:
866
917
- Supported number of objects per namespace (for namespace-scoped
867
918
objects) -->
868
919
920
+ Yes, but these API types are not persisted.
921
+
869
922
###### Will enabling / using this feature result in any new calls to the cloud provider?
870
923
871
924
<!-- Describe them, providing:
872
925
- Which API(s):
873
926
- Estimated increase: -->
874
927
928
+ No.
929
+
875
930
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
876
931
877
932
<!-- Describe them, providing:
@@ -880,6 +935,8 @@ objects) -->
880
935
- Estimated amount of new objects: (e.g., new Object X for every
881
936
existing Pod) -->
882
937
938
+ No.
939
+
883
940
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
884
941
885
942
<!-- Look at the [existing SLIs/SLOs].
@@ -890,6 +947,8 @@ details.
890
947
891
948
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos -->
892
949
950
+ No.
951
+
893
952
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
894
953
895
954
<!-- Things to keep in mind include: additional in-memory state,
@@ -900,6 +959,8 @@ large cases, again with respect to the [supported limits].
900
959
901
960
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md -->
902
961
962
+ No.
963
+
903
964
### Troubleshooting
904
965
905
966
<!-- This section must be completed when targeting beta to a release.
@@ -914,6 +975,8 @@ may consider splitting it into a dedicated `Playbook` document
914
975
915
976
###### How does this feature react if the API server and/or etcd is unavailable?
916
977
978
+ The feature is built into the API server, and will not work if the API server is unavailable.
979
+
917
980
###### What are other known failure modes?
918
981
919
982
<!-- For each of them, fill in the following information by copying
@@ -932,6 +995,8 @@ why. -->
932
995
933
996
###### What steps should be taken if SLOs are not being met to determine the problem?
934
997
998
+ The feature can be rolled back by setting the AggregatedDiscoveryEndpoint feature flag to false.
999
+
935
1000
## Implementation History
936
1001
937
1002
<!-- Major milestones in the lifecycle of a KEP should be tracked in
0 commit comments