Skip to content

Commit 8568805

Browse files
authored
Move KEP-2845 to implementable (#2912)
* Move KEP-2845 to implementable * Add PRR for #2845 * Remove mention of --logtostdout flag in #2845 * KEP #2485: Require splitting stdout/stderr to Json format * Add pohly as reviewer for KEP 2845 * #2845 Update PRR * #2845 Add dims as SIG arch approver
1 parent 851990a commit 8568805

File tree

3 files changed

+204
-58
lines changed
  • keps
    • prod-readiness/sig-instrumentation
    • sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components

3 files changed

+204
-58
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 2845
2+
alpha:
3+
approver: "@ehashman"

keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components/README.md

Lines changed: 198 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
- [Proposal](#proposal)
1010
- [Removed klog flags](#removed-klog-flags)
1111
- [Logging defaults](#logging-defaults)
12-
- [Split stdout and stderr](#split-stdout-and-stderr)
1312
- [Logging headers](#logging-headers)
1413
- [User Stories](#user-stories)
1514
- [Writing logs to files](#writing-logs-to-files)
@@ -23,27 +22,34 @@
2322
- [Alpha](#alpha)
2423
- [Beta](#beta)
2524
- [GA](#ga)
26-
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
27-
- [Version Skew Strategy](#version-skew-strategy)
2825
- [Implementation History](#implementation-history)
2926
- [Drawbacks](#drawbacks)
3027
- [Alternatives](#alternatives)
3128
- [Continue supporting all klog features](#continue-supporting-all-klog-features)
3229
- [Release klog 3.0 with removed features](#release-klog-30-with-removed-features)
30+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
31+
- [Version Skew Strategy](#version-skew-strategy)
32+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
33+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
34+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
35+
- [Monitoring Requirements](#monitoring-requirements)
36+
- [Dependencies](#dependencies)
37+
- [Scalability](#scalability)
38+
- [Troubleshooting](#troubleshooting)
3339
<!-- /toc -->
3440

3541
## Release Signoff Checklist
3642

3743
Items marked with (R) are required *prior to targeting to a milestone / release*.
3844

39-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
40-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
41-
- [ ] (R) Design details are appropriately documented
45+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
46+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
47+
- [x] (R) Design details are appropriately documented
4248
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
4349
- [ ] e2e Tests for all Beta API Operations (endpoints)
4450
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
4551
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
46-
- [ ] (R) Graduation criteria is in place
52+
- [x] (R) Graduation criteria is in place
4753
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
4854
- [ ] (R) Production readiness review completed
4955
- [ ] (R) Production readiness review approved
@@ -114,16 +120,14 @@ best practices.
114120
### Non-Goals
115121

116122
* Change klog output format
123+
* Remove flags from klog
117124

118125
## Proposal
119126

120127
I propose to remove klog specific feature flags in Kubernetes core components
121-
(kube-apiserver, kube-scheduler, kube-controller-manager, kubelet) and set them
122-
to agreed good defaults. From klog flags we would remove all flags besides "-v"
123-
and "-vmodule". With removal of flags to route logs based on type we want to
124-
change the default routing that will work as better default. Changing the
125-
defaults will be done in via multi release process, that will introduce some
126-
temporary flags that will be removed at the same time as other klog flags.
128+
(kube-apiserver, kube-scheduler, kube-controller-manager, kubelet) and leave
129+
them with defaults. From klog flags we would remove all flags besides "-v"
130+
and "-vmodule".
127131

128132
### Removed klog flags
129133

@@ -159,46 +163,27 @@ This leaves that two flags that should be implemented by all log formats
159163
* --vmodule - control log verbosity of Info logs on per file level
160164

161165
Those flags were chosen as they have effect of which logs are written,
162-
directly impacting log volume and component performance.
166+
directly impacting log volume and component performance. Flag `-v` will be
167+
supported by all logging formats, however `-vmodule` will be optional for non
168+
default "text" format.
163169

164170
### Logging defaults
165171

166172
With removal of configuration alternatives we need to make sure that defaults
167173
make sense. List of logging features implemented by klog and proposed actions:
168-
* Routing logs based on type/verbosity - Should be reconsidered.
174+
* Routing logs based on type/verbosity - Supported by alternative logging formats.
169175
* Writing logs to file - Feature removed.
170176
* Log file rotation based on file size - Feature removed.
171177
* Configuration of log headers - Use the current defaults.
172178
* Adding stacktrace - Feature removed.
173179

174-
For log routing I propose to adopt UNIX convention of writing info logs to
175-
stdout and errors to stderr. For log headers I propose to use the current
176-
default.
177-
178-
#### Split stdout and stderr
179-
180-
As logs should be treated as event streams I would propose that we separate two
181-
main streams "info" and "error" based on log method called. As error logs should
182-
usually be treated with higher priority, having two streams prevents single
183-
pipeline from being clogged down (for example
184-
[kubernetes/klog#209](https://github.com/kubernetes/klog/issues/209)).
185-
For logging formats writing to standard streams, we should follow UNIX standard
186-
of mapping "info" logs to stdout and "error" logs to stderr.
187-
188-
Splitting stdout from stderr would be a breaking change in both klog and
189-
kubernetes components. However, we expect only minimal impact on users, as
190-
redirecting both streams is a common practice. In rare cases that will be
191-
impacted, adapting to this change should be a 1 line change. Still we will want
192-
to give users a proper heads up before making this change, so we will hide the
193-
change behind a new logging flag `--logtostdout`. This flag will be used avoid
194-
introducing breaking change in klog.
195-
196-
With this flag we can follow multi release plan to minimize user impact (each
197-
point should be done in a separate Kubernetes release):
198-
1. Introduce the flag in disabled state and start using it in tests.
199-
1. Announce flag availability and encourage users to adopt it.
200-
1. Enable the flag by default and deprecate it (allows users to flip back to previous behavior)
201-
1. Remove the flag following the deprecation policy.
180+
Ability to route logs based on type/verbosity will be replaced with default
181+
splitting info and errors logs to stdout and stderr. We will make this change
182+
only in alternative logging formats (like JSON) as we don't want to introduce
183+
breaking change in default configuration. Splitting stream will allow treating
184+
info and errors with different priorities. It will unblock efforts like
185+
[kubernetes/klog#209](https://github.com/kubernetes/klog/issues/209) to make
186+
info logs non-blocking.
202187

203188
#### Logging headers
204189

@@ -288,32 +273,23 @@ all existing klog features.
288273
- Kubernetes logging configuration drops global state
289274
- Go-runner is feature complementary to klog flags planned for deprecation
290275
- Projects in Kubernetes Org are migrated to go-runner
291-
- Add --logtostdout flag to klog disabled by default
292-
- Use --logtostdout in kubernetes tests
276+
- JSON logs format splits stdout and stderr logs
293277

294278
#### Beta
295279

296280
- Go-runner project is well maintained and documented
297281
- Documentation on migrating off klog flags is publicly available
298282
- Kubernetes klog flags are marked as deprecated
299-
- Enable --logtostdout in Kubernetes components by default
300283

301284
#### GA
302285

303-
- Kubernetes klog specific flags are removed (including --logtostdout)
304-
305-
### Upgrade / Downgrade Strategy
306-
307-
N/A
308-
309-
### Version Skew Strategy
310-
311-
N/A
286+
- Kubernetes klog specific flags are removed
312287

313288
## Implementation History
314289

315290
- 20/06/2021 - Original proposal created in https://github.com/kubernetes/kubernetes/issues/99270
316-
- 30/07/2021 - First KEP draft was created
291+
- 30/07/2021 - KEP draft was created
292+
- 26/08/2021 - Merged in provisional state
317293

318294
## Drawbacks
319295

@@ -333,3 +309,169 @@ features makes their future removal much harder.
333309
### Release klog 3.0 with removed features
334310
Removal of those features cannot be done without whole k8s community instead of
335311
just k8s core components
312+
313+
### Upgrade / Downgrade Strategy
314+
315+
For removal of klog specific flags we will be following K8s deprecation policy.
316+
There will be 3 releases between informing users about deprecation and full removal.
317+
During deprecation period there will not be any changes in behavior for clusters
318+
using deprecated features, however after removal there will not be a way to
319+
restore previous behavior. 3 releases should be enough heads up for users to
320+
make necessary changes to avoid breakage.
321+
322+
### Version Skew Strategy
323+
324+
Proposed changes have no impact on cluster that would require coordination.
325+
They only affect binary configuration and logs are written, which don't impact
326+
other components in cluster. Users might be required to change flags passed to
327+
k8s binaries, but this can be done one by one independently of other components.
328+
329+
## Production Readiness Review Questionnaire
330+
331+
### Feature Enablement and Rollback
332+
333+
###### How can this feature be enabled / disabled in a live cluster?
334+
335+
- [ ] Feature gate (also fill in values in `kep.yaml`)
336+
- Feature gate name:
337+
- Components depending on the feature gate:
338+
- [x] Other
339+
- Describe the mechanism: Passing command line flag to K8s component binaries.
340+
- Will enabling / disabling the feature require downtime of the control
341+
plane?
342+
**Yes, for apiserver it will require a restart, which can be considered a
343+
control plane downtime in non highly available clusters**
344+
- Will enabling / disabling the feature require downtime or reprovisioning
345+
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
346+
**Yes, it will require restart of Kubelet**
347+
348+
###### Does enabling the feature change any default behavior?
349+
350+
No, we are not changing the default behavior.
351+
352+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
353+
354+
After deprecation period, flags will be removed and users will not be able to re-enable them.
355+
Only way to re-enable them would be to downgrade the cluster.
356+
357+
###### What happens if we reenable the feature if it was previously rolled back?
358+
359+
Flags cannot be reenabled without downgrading.
360+
361+
###### Are there any tests for feature enablement/disablement?
362+
363+
N/A, we are not introducing any new behavior.
364+
365+
### Rollout, Upgrade and Rollback Planning
366+
367+
<!--
368+
This section must be completed when targeting beta to a release.
369+
-->
370+
371+
###### How can a rollout or rollback fail? Can it impact already running workloads?
372+
373+
For removing klog flags, we don't have any escape hatch. Such breaking changes
374+
will be properly announced, but users will need to make adjustments before
375+
deprecation period finishes.
376+
377+
###### What specific metrics should inform a rollback?
378+
379+
Users could observe number of logs from K8s components that they ingest. If
380+
there is a large drop in logs they get, whey should consider a rollback and
381+
validate if their logging setup supports consuming binary stdout.
382+
383+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
384+
385+
N/A, logging is stateless.
386+
387+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
388+
389+
Yes, as discussed above we will be removing klog flags.
390+
391+
### Monitoring Requirements
392+
393+
<!--
394+
This section must be completed when targeting beta to a release.
395+
-->
396+
397+
###### How can an operator determine if the feature is in use by workloads?
398+
399+
400+
###### How can someone using this feature know that it is working for their instance?
401+
402+
N/A
403+
404+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
405+
406+
N/A
407+
408+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
409+
410+
N/A
411+
412+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
413+
414+
To detect if user logging system is consuming all logs generated by K8s
415+
components it would be useful to have a metric to measure number of logs
416+
generated. However, this is out of scope of this proposal, as topic of measuring
417+
logging pipeline reliability heavily depends on third party logging systems that
418+
are outside K8s scope.
419+
420+
### Dependencies
421+
422+
N/A
423+
424+
###### Does this feature depend on any specific services running in the cluster?
425+
426+
No
427+
428+
### Scalability
429+
430+
Scalability of logging pipeline is verified by existing scalability tests. We
431+
don't plan to make any changes to existing tests.
432+
433+
###### Will enabling / using this feature result in any new API calls?
434+
435+
No
436+
437+
###### Will enabling / using this feature result in introducing new API types?
438+
439+
No
440+
441+
###### Will enabling / using this feature result in any new calls to the cloud provider?
442+
443+
No
444+
445+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
446+
447+
No
448+
449+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
450+
451+
No
452+
453+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
454+
455+
No
456+
457+
### Troubleshooting
458+
459+
<!--
460+
This section must be completed when targeting beta to a release.
461+
462+
The Troubleshooting section currently serves the `Playbook` role. We may consider
463+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
464+
details). For now, we leave it here.
465+
-->
466+
467+
###### How does this feature react if the API server and/or etcd is unavailable?
468+
469+
Logs don't have a remote dependency on the API server or etcd.
470+
471+
###### What are other known failure modes?
472+
473+
No
474+
475+
###### What steps should be taken if SLOs are not being met to determine the problem?
476+
477+
N/A

keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components/kep.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,13 @@ authors:
55
owning-sig: sig-instrumentation
66
participating-sigs:
77
- sig-arch
8-
status: provisional
8+
status: implementable
99
creation-date: 2021-07-30
1010
reviewers:
11-
- TBD
11+
- pohly
1212
approvers:
1313
- ehashman
14+
- dims
1415

1516
see-also:
1617
- "/keps/sig-instrumentation/1602-structured-logging"

0 commit comments

Comments
 (0)