@@ -616,6 +616,7 @@ main focus will be on kubectl and golang clients.
616
616
#### Beta
617
617
618
618
- kubectl uses the aggregated discovery feature by default
619
+ - Metrics are added
619
620
620
621
#### GA
621
622
@@ -678,7 +679,12 @@ channel if you need any help or guidance. -->
678
679
679
680
###### Does enabling the feature change any default behavior?
680
681
681
- No
682
+ Clients using client-go version 1.26 and up will use the aggregated
683
+ discovery endpoint rather than the unaggregated discovery endpoint.
684
+ This is handled automatically in client-go and clients should see less
685
+ requests to the api server when fetching discovery information. Client
686
+ versions older than 1.26 will continue to use the old unaggregated
687
+ discovery endpoint without any changes.
682
688
683
689
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
684
690
@@ -690,7 +696,13 @@ and restarting the component. No other changes should be necessary to
690
696
disable the feature.
691
697
692
698
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
693
- --> Yes, the feature may be disabled by reverting the feature flag.
699
+ -->
700
+
701
+ Yes, the feature may be disabled on the apiserver by reverting the
702
+ feature flag. This will disable aggregated discovery for all clients. If there is a golang specific client side bug, the feature may also be
703
+ turned off in client-go via the
704
+ [ WithLegacy()] ( https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/discovery/discovery_client.go#L80 )
705
+ toggle and this will require a recompile of the application.
694
706
695
707
###### What happens if we reenable the feature if it was previously rolled back?
696
708
@@ -731,6 +743,18 @@ feature flags will be enabled on some API servers and not others
731
743
during the rollout. Similarly, consider large clusters and how
732
744
enablement/disablement will rollout across nodes. -->
733
745
746
+ During a rollout, some apiservers may support aggregated discovery and
747
+ some may not. It is recommended that clients request for both the
748
+ aggregated discovery document with a fallback to the unaggregated
749
+ discovery format. This can be achieved by setting the Accept header to
750
+ have a fallback to the default GVK of the ` /apis ` and ` /api ` endpoint.
751
+ For example, to request the aggregated discovery type and fallback to
752
+ the unaggregated discovery, the following header can be sent: `Accept:
753
+ application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json`
754
+
755
+ This kind of fallback is already implemented in client-go and this
756
+ note is intended for non-golang clients.
757
+
734
758
###### What specific metrics should inform a rollback?
735
759
736
760
<!-- What signals should users be paying attention to when the feature
@@ -748,11 +772,20 @@ term, we may want to require automated upgrade/rollback tests, but we
748
772
are missing a bunch of machinery and tooling and can't do that now.
749
773
-->
750
774
775
+ n/a. The API introduced does not store data and state is recalculated on the upgrade, downgrade, upgrade cycle. No state is preserved between versions.
776
+
751
777
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
752
778
753
779
<!-- Even if applying deprecation policies, they may still surprise
754
780
some users. -->
755
781
782
+ By enabling aggregated discovery as the default, the new API is
783
+ slightly different from the unaggregated version. The
784
+ StorageVersionHash field is removed from resources in the aggregated
785
+ discovery API. The storage version migrator will have an additional
786
+ flag when initializing the discovery client to continue using the
787
+ unaggregated API.
788
+
756
789
### Monitoring Requirements
757
790
758
791
<!-- This section must be completed when targeting beta to a release.
@@ -766,6 +799,17 @@ the previous answers based on experience in the field. -->
766
799
Kubernetes API (e.g., checking if there are objects with field X set)
767
800
may be a last resort. Avoid logs or events for this purpose. -->
768
801
802
+ Operators can check whether an aggregated discovery request can be
803
+ made by sending a request to ` apis ` with
804
+ ` application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json `
805
+ as the Accept header and looking at the the ` Content-Type ` response
806
+ header. A Content Type response header of `Content-Type:
807
+ application/json;g=apidiscovery.k8s.io;v=v2beta1;as=APIGroupDiscoveryList`
808
+ indicates that aggregated discovery is supported and a `Content-Type:
809
+ application/json` header indicates that aggregated discovery is not
810
+ supported. They can also check for the presence of aggregated
811
+ discovery related metrics: ` aggregated_discovery_aggregation_count `
812
+
769
813
###### How can someone using this feature know that it is working for their instance?
770
814
771
815
<!-- For instance, if this is a pod-related feature, it should be
@@ -802,14 +846,18 @@ the next question. -->
802
846
- [x] Metrics
803
847
- Metric name: ` aggregator_discovery_aggregation_duration `
804
848
- Components exposing the metric: ` kube-server `
805
- - This is a metric for exposing the time it took to aggregate all the
849
+ - This is a metric for exposing the time it took to aggregate all the api resources.
850
+
851
+ - Metric name: ` aggregator_discovery_aggregation_count `
852
+ - Components exposing the metric: ` kube-server `
853
+ - This is a metric for the number of times that the discovery document has been aggregated.
806
854
807
855
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
808
856
809
857
<!-- Describe the metrics themselves and the reasons why they weren't
810
858
added (e.g., cost, implementation difficulties, etc.). -->
811
859
812
- Yes. A metric for the regeneration count of the discovery document. ` aggregator_discovery_aggregation_count `
860
+ No.
813
861
814
862
### Dependencies
815
863
@@ -833,6 +881,12 @@ cluster-level services (e.g. DNS):
833
881
- Impact of its degraded performance or high-error rates on the
834
882
feature: -->
835
883
884
+ No, but if aggregated apiservers are present, the feature will attempt
885
+ to contact and aggregate the data published from the aggregated
886
+ apiserver on a set interval. If there is high error rate, stale data
887
+ may be returned because the latest data was not able to be fetched
888
+ from the aggregated apiserver.
889
+
836
890
### Scalability
837
891
838
892
<!-- For alpha, this section is encouraged: reviewers should consider
@@ -858,6 +912,12 @@ Focusing mostly on:
858
912
- periodic API calls to reconcile state (e.g. periodic fetching
859
913
state, heartbeats, leader election, etc.) -->
860
914
915
+ No. Enabling this feature should reduce the total number of API calls
916
+ for client discovery. Instead of clients sending a discovery request
917
+ to all group versions (` /apis/<group>/<version> ` ), they will only need
918
+ to send a request to the aggregated endpoint to obtain all resources
919
+ that the cluster supports.
920
+
861
921
###### Will enabling / using this feature result in introducing new API types?
862
922
863
923
<!-- Describe them, providing:
@@ -866,12 +926,16 @@ Focusing mostly on:
866
926
- Supported number of objects per namespace (for namespace-scoped
867
927
objects) -->
868
928
929
+ Yes, but these API types are not persisted.
930
+
869
931
###### Will enabling / using this feature result in any new calls to the cloud provider?
870
932
871
933
<!-- Describe them, providing:
872
934
- Which API(s):
873
935
- Estimated increase: -->
874
936
937
+ No.
938
+
875
939
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
876
940
877
941
<!-- Describe them, providing:
@@ -880,6 +944,8 @@ objects) -->
880
944
- Estimated amount of new objects: (e.g., new Object X for every
881
945
existing Pod) -->
882
946
947
+ No.
948
+
883
949
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
884
950
885
951
<!-- Look at the [existing SLIs/SLOs].
@@ -890,6 +956,8 @@ details.
890
956
891
957
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos -->
892
958
959
+ No.
960
+
893
961
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
894
962
895
963
<!-- Things to keep in mind include: additional in-memory state,
@@ -900,6 +968,8 @@ large cases, again with respect to the [supported limits].
900
968
901
969
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md -->
902
970
971
+ No.
972
+
903
973
### Troubleshooting
904
974
905
975
<!-- This section must be completed when targeting beta to a release.
@@ -914,6 +984,8 @@ may consider splitting it into a dedicated `Playbook` document
914
984
915
985
###### How does this feature react if the API server and/or etcd is unavailable?
916
986
987
+ The feature is built into the API server, and will not work if the API server is unavailable.
988
+
917
989
###### What are other known failure modes?
918
990
919
991
<!-- For each of them, fill in the following information by copying
@@ -932,6 +1004,8 @@ why. -->
932
1004
933
1005
###### What steps should be taken if SLOs are not being met to determine the problem?
934
1006
1007
+ The feature can be rolled back by setting the AggregatedDiscoveryEndpoint feature flag to false.
1008
+
935
1009
## Implementation History
936
1010
937
1011
<!-- Major milestones in the lifecycle of a KEP should be tracked in
0 commit comments