Skip to content

Commit c884044

Browse files
authored
Merge pull request #3722 from Jefftree/agg-discovery-beta
KEP-3352 Aggregated Discovery to Beta
2 parents f633d91 + 524a133 commit c884044

File tree

3 files changed

+86
-8
lines changed

3 files changed

+86
-8
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 3352
22
alpha:
33
approver: "@deads2k"
4+
beta:
5+
approver: "@deads2k"

keps/sig-api-machinery/3352-aggregated-discovery/README.md

Lines changed: 78 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -616,6 +616,7 @@ main focus will be on kubectl and golang clients.
616616
#### Beta
617617

618618
- kubectl uses the aggregated discovery feature by default
619+
- Metrics are added
619620

620621
#### GA
621622

@@ -678,7 +679,12 @@ channel if you need any help or guidance. -->
678679

679680
###### Does enabling the feature change any default behavior?
680681

681-
No
682+
Clients using client-go version 1.26 and up will use the aggregated
683+
discovery endpoint rather than the unaggregated discovery endpoint.
684+
This is handled automatically in client-go and clients should see less
685+
requests to the api server when fetching discovery information. Client
686+
versions older than 1.26 will continue to use the old unaggregated
687+
discovery endpoint without any changes.
682688

683689
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
684690

@@ -690,7 +696,13 @@ and restarting the component. No other changes should be necessary to
690696
disable the feature.
691697
692698
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
693-
--> Yes, the feature may be disabled by reverting the feature flag.
699+
-->
700+
701+
Yes, the feature may be disabled on the apiserver by reverting the
702+
feature flag. This will disable aggregated discovery for all clients. If there is a golang specific client side bug, the feature may also be
703+
turned off in client-go via the
704+
[WithLegacy()](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/discovery/discovery_client.go#L80)
705+
toggle and this will require a recompile of the application.
694706

695707
###### What happens if we reenable the feature if it was previously rolled back?
696708

@@ -731,6 +743,18 @@ feature flags will be enabled on some API servers and not others
731743
during the rollout. Similarly, consider large clusters and how
732744
enablement/disablement will rollout across nodes. -->
733745

746+
During a rollout, some apiservers may support aggregated discovery and
747+
some may not. It is recommended that clients request for both the
748+
aggregated discovery document with a fallback to the unaggregated
749+
discovery format. This can be achieved by setting the Accept header to
750+
have a fallback to the default GVK of the `/apis` and `/api` endpoint.
751+
For example, to request the aggregated discovery type and fallback to
752+
the unaggregated discovery, the following header can be sent: `Accept:
753+
application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json`
754+
755+
This kind of fallback is already implemented in client-go and this
756+
note is intended for non-golang clients.
757+
734758
###### What specific metrics should inform a rollback?
735759

736760
<!-- What signals should users be paying attention to when the feature
@@ -748,11 +772,20 @@ term, we may want to require automated upgrade/rollback tests, but we
748772
are missing a bunch of machinery and tooling and can't do that now.
749773
-->
750774

775+
n/a. The API introduced does not store data and state is recalculated on the upgrade, downgrade, upgrade cycle. No state is preserved between versions.
776+
751777
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
752778

753779
<!-- Even if applying deprecation policies, they may still surprise
754780
some users. -->
755781

782+
By enabling aggregated discovery as the default, the new API is
783+
slightly different from the unaggregated version. The
784+
StorageVersionHash field is removed from resources in the aggregated
785+
discovery API. The storage version migrator will have an additional
786+
flag when initializing the discovery client to continue using the
787+
unaggregated API.
788+
756789
### Monitoring Requirements
757790

758791
<!-- This section must be completed when targeting beta to a release.
@@ -766,6 +799,17 @@ the previous answers based on experience in the field. -->
766799
Kubernetes API (e.g., checking if there are objects with field X set)
767800
may be a last resort. Avoid logs or events for this purpose. -->
768801

802+
Operators can check whether an aggregated discovery request can be
803+
made by sending a request to `apis` with
804+
`application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json`
805+
as the Accept header and looking at the the `Content-Type` response
806+
header. A Content Type response header of `Content-Type:
807+
application/json;g=apidiscovery.k8s.io;v=v2beta1;as=APIGroupDiscoveryList`
808+
indicates that aggregated discovery is supported and a `Content-Type:
809+
application/json` header indicates that aggregated discovery is not
810+
supported. They can also check for the presence of aggregated
811+
discovery related metrics: `aggregated_discovery_aggregation_count`
812+
769813
###### How can someone using this feature know that it is working for their instance?
770814

771815
<!-- For instance, if this is a pod-related feature, it should be
@@ -802,14 +846,18 @@ the next question. -->
802846
- [x] Metrics
803847
- Metric name: `aggregator_discovery_aggregation_duration`
804848
- Components exposing the metric: `kube-server`
805-
- This is a metric for exposing the time it took to aggregate all the
849+
- This is a metric for exposing the time it took to aggregate all the api resources.
850+
851+
- Metric name: `aggregator_discovery_aggregation_count`
852+
- Components exposing the metric: `kube-server`
853+
- This is a metric for the number of times that the discovery document has been aggregated.
806854

807855
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
808856

809857
<!-- Describe the metrics themselves and the reasons why they weren't
810858
added (e.g., cost, implementation difficulties, etc.). -->
811859

812-
Yes. A metric for the regeneration count of the discovery document. `aggregator_discovery_aggregation_count`
860+
No.
813861

814862
### Dependencies
815863

@@ -833,6 +881,12 @@ cluster-level services (e.g. DNS):
833881
- Impact of its degraded performance or high-error rates on the
834882
feature: -->
835883

884+
No, but if aggregated apiservers are present, the feature will attempt
885+
to contact and aggregate the data published from the aggregated
886+
apiserver on a set interval. If there is high error rate, stale data
887+
may be returned because the latest data was not able to be fetched
888+
from the aggregated apiserver.
889+
836890
### Scalability
837891

838892
<!-- For alpha, this section is encouraged: reviewers should consider
@@ -858,6 +912,12 @@ Focusing mostly on:
858912
- periodic API calls to reconcile state (e.g. periodic fetching
859913
state, heartbeats, leader election, etc.) -->
860914

915+
No. Enabling this feature should reduce the total number of API calls
916+
for client discovery. Instead of clients sending a discovery request
917+
to all group versions (`/apis/<group>/<version>`), they will only need
918+
to send a request to the aggregated endpoint to obtain all resources
919+
that the cluster supports.
920+
861921
###### Will enabling / using this feature result in introducing new API types?
862922

863923
<!-- Describe them, providing:
@@ -866,12 +926,16 @@ Focusing mostly on:
866926
- Supported number of objects per namespace (for namespace-scoped
867927
objects) -->
868928

929+
Yes, but these API types are not persisted.
930+
869931
###### Will enabling / using this feature result in any new calls to the cloud provider?
870932

871933
<!-- Describe them, providing:
872934
- Which API(s):
873935
- Estimated increase: -->
874936

937+
No.
938+
875939
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
876940

877941
<!-- Describe them, providing:
@@ -880,6 +944,8 @@ objects) -->
880944
- Estimated amount of new objects: (e.g., new Object X for every
881945
existing Pod) -->
882946

947+
No.
948+
883949
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
884950

885951
<!-- Look at the [existing SLIs/SLOs].
@@ -890,6 +956,8 @@ details.
890956
891957
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos -->
892958

959+
No.
960+
893961
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
894962

895963
<!-- Things to keep in mind include: additional in-memory state,
@@ -900,6 +968,8 @@ large cases, again with respect to the [supported limits].
900968
901969
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md -->
902970

971+
No.
972+
903973
### Troubleshooting
904974

905975
<!-- This section must be completed when targeting beta to a release.
@@ -914,6 +984,8 @@ may consider splitting it into a dedicated `Playbook` document
914984

915985
###### How does this feature react if the API server and/or etcd is unavailable?
916986

987+
The feature is built into the API server, and will not work if the API server is unavailable.
988+
917989
###### What are other known failure modes?
918990

919991
<!-- For each of them, fill in the following information by copying
@@ -932,6 +1004,8 @@ why. -->
9321004

9331005
###### What steps should be taken if SLOs are not being met to determine the problem?
9341006

1007+
The feature can be rolled back by setting the AggregatedDiscoveryEndpoint feature flag to false.
1008+
9351009
## Implementation History
9361010

9371011
<!-- Major milestones in the lifecycle of a KEP should be tracked in

keps/sig-api-machinery/3352-aggregated-discovery/kep.yaml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,17 @@ approvers:
1818
- "@lavalamp"
1919

2020
# The target maturity stage in the current dev cycle for this KEP.
21-
stage: alpha
21+
stage: beta
2222

2323
# The most recent milestone for which work toward delivery of this KEP has been
2424
# done. This can be the current (upcoming) milestone, if it is being actively
2525
# worked on.
26-
latest-milestone: "v1.26"
26+
latest-milestone: "v1.27"
2727

2828
# The milestone at which this feature was, or is targeted to be, at each stage.
2929
milestone:
3030
alpha: "v1.26"
31+
beta: "v1.27"
3132

3233
# The following PRR answers are required at alpha release
3334
# List the feature gate name and the components for which it must be enabled
@@ -38,5 +39,6 @@ feature-gates:
3839
disable-supported: true
3940

4041
# The following PRR answers are required at beta release
41-
# metrics:
42-
# - my_feature_metric
42+
metrics:
43+
- aggregator_discovery_aggregation_duration
44+
- aggregator_discovery_aggregation_count

0 commit comments

Comments
 (0)