-
Notifications
You must be signed in to change notification settings - Fork 128
OCPBUGS-6661: Manage mTLS CRLs #447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
706a67b
to
e431715
Compare
pkg/router/crl/crl.go
Outdated
if nextUpdate.IsZero() { | ||
// TODO: Make this a longer interval (for example, hourly) or implement backoff logic? | ||
log.V(1).Info("Couldn't determine next CRL update time. Falling back to updating every 5 minutes.") | ||
nextUpdate = time.Now().Add(5 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every five minutes seems rather frequent, and CRLs can be quite large. I think daily is more typical for CRLs, right?
In any case, can you add a named constant for the default update interval?
const (
// defaultUpdateInterval is the interval to wait before updating CRLs
// when the CRLs do not specify any next update time.
defaultUpdateInterval = 1 * time.Day
// ...
)
nextUpdate = time.Now().Add(5 * time.Minute) | |
nextUpdate = time.Now().Add(defaultUpdateInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently have it set to 5 minutes to aid in testing, since I manually update the CRL files, and sometimes I've missed the actual expiration time. I will add a constant with the appropriate fallback interval before this is ready to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the failure was transient should we wait for another day before downloading again? I could see a case where we do a number of retries before falling back to downloading once a day. Otherwise (and assuming a transient download failure) the only way to get a new list (in a hurry) is to delete the pod.
@rfredette: This pull request references Jira Issue OCPBUGS-6661, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1673c1c
to
d309252
Compare
pkg/router/crl/crl.go
Outdated
if nextUpdate.IsZero() { | ||
// TODO: Make this a longer interval (for example, hourly) or implement backoff logic? | ||
log.V(1).Info("Couldn't determine next CRL update time. Falling back to updating every 5 minutes.") | ||
nextUpdate = time.Now().Add(5 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the failure was transient should we wait for another day before downloading again? I could see a case where we do a number of retries before falling back to downloading once a day. Otherwise (and assuming a transient download failure) the only way to get a new list (in a hurry) is to delete the pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the latest changes. The cleanup of stale mTLS config looks much more straightforward now. Nice improvement!
However, several comments from my previous review don't appear to have been addressed. Can you make sure that past review comments haven't been overlooked? It's fine if you decide to reject some of the suggestions, but please reply in those cases so that I know a comment wasn't inadvertently missed.
7a406f0
to
f4898ec
Compare
I plan to remove the build changes I made (in Makefile and the haproxy Dockerfile.rhel8), set the retry timeout to a final value, probably larger than 5 minutes, and squash the commits before this merges. It still needs some cleanup in the tags, but just in case: /hold |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall look pretty good. A bit hard to understand what's happening, but I think I got the essence of it.
pkg/router/crl/crl.go
Outdated
for { | ||
updated := false | ||
if nextUpdate.IsZero() { | ||
log.V(4).Info("no nextUpdate. only watching for CA updates") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Should we have these log messages describe where these are coming from? As they stand on their own, it might be confusing to a dev in the future when they turn on verbose logging. e.g.
log.V(4).Info("no nextUpdate. only watching for CA updates") | |
log.V(4).Info("ManageCRLs: no nextUpdate. only watching for CA updates") |
crls := make(map[string]*x509.RevocationList) | ||
updated := false | ||
now := time.Now() | ||
for len(clientCAData) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is duplicated with ShouldHaveCRLs
. Is there a way to clean this up? Seems like a maintenance burden.
Seems like you could just wait to read the clientCAData
(it's done early in writeCRLFile
, but not necessary), and then pass caBundleFilename
to ShouldHaveCRLs
.
pkg/router/crl/crl.go
Outdated
return nil | ||
} | ||
|
||
// ShouldHaveCRLs returns true if any of the certificates in caBundleFilename specify a CRL distribution point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit and it returns an error
@@ -290,8 +292,8 @@ frontend fe_sni | |||
{{- "" }} crt-list /var/lib/haproxy/conf/cert_config.map accept-proxy | |||
{{- with (env "ROUTER_MUTUAL_TLS_AUTH") }} | |||
{{- "" }} verify {{. }} | |||
{{- with (env "ROUTER_MUTUAL_TLS_AUTH_CA") }} ca-file {{. }} {{ else }} ca-file /etc/ssl/certs/ca-bundle.trust.crt {{ end }} | |||
{{- with (env "ROUTER_MUTUAL_TLS_AUTH_CRL") }} crl-file {{. }} {{ end }} | |||
{{- if $haveClientCA }} ca-file /var/lib/haproxy/mtls/latest/ca-bundle.pem {{ else }} ca-file /etc/ssl/certs/ca-bundle.trust.crt {{ end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, /var/lib/haproxy/mtls/latest/ca-bundle.pem
could also be GO template variables, since they are already defined in crl.go
// caUpdateChannel (indicating the CA bundle has been updated), or when any existing CRL expires. Whenever either the CA | ||
// bundle or the CRL file has changed, updateCallback is called, with a boolean indicating whether crl-file needs to be | ||
// specified in the HAProxy config. | ||
func ManageCRLs(caBundleFilename string, caUpdateChannel <-chan struct{}, updateCallback func(bool)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have some unit testing for at least the functions that aren't a part of templateRouter
struct. Might have to be a little creative with the tests, but I think it's do-able.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most (if not all) of the functions I've added either deal with downloading data from another source, or read and write files in fixed locations on the filesystem, so isolating them to run in a unit test scenario would take some work. If I'm able to get good enough coverage with e2e tests, what do you think about me circling back to figure out unit tests in a later PR?
990dd53
to
79f25b4
Compare
79f25b4
to
b2e8375
Compare
Commandeering this CI to test if haproxy22 RPM is still available |
/jira refresh |
@rfredette: This pull request references Jira Issue OCPBUGS-6661, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required |
b2e8375
to
fd46043
Compare
fd46043
to
8c320e2
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@rfredette: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
{{- if $haveClientCA }} ca-file /var/lib/haproxy/mtls/latest/ca-bundle.pem {{ else }} ca-file /etc/ssl/certs/ca-bundle.trust.crt {{ end }} | ||
{{- if $haveCRLs }} crl-file /var/lib/haproxy/mtls/latest/crls.pem {{ end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You replaced spaces with tabs. Please use spaces for consistency with the rest of the template.
I'm wary of breaking OKD users, or people using custom router deployments (unsupported, but better not to break them if we don't need to). If ROUTER_MUTUAL_TLS_AUTH_CA
or ROUTER_MUTUAL_TLS_AUTH_CRL
is set, we should still respect it. Likewise for ROUTER_MUTUAL_TLS_AUTH_CRL
below.
{{- if $haveClientCA }} ca-file /var/lib/haproxy/mtls/latest/ca-bundle.pem {{ else }} ca-file /etc/ssl/certs/ca-bundle.trust.crt {{ end }} | |
{{- if $haveCRLs }} crl-file /var/lib/haproxy/mtls/latest/crls.pem {{ end }} | |
{{- with (env "ROUTER_MUTUAL_TLS_AUTH_CA") }} ca-file {{ . }} {{- else }}{{ if $haveClientCA }} ca-file /var/lib/haproxy/mtls/latest/ca-bundle.pem {{- else }} ca-file /etc/ssl/certs/ca-bundle.trust.crt {{- end }}{{ end }} | |
{{- with (env "ROUTER_MUTUAL_TLS_AUTH_CRL") }} crl-file {{ . }} {{- else }}{{ if $haveCRLs }} crl-file /var/lib/haproxy/mtls/latest/crls.pem {{- end }}{{ end }} |
Likewise for ROUTER_MUTUAL_TLS_AUTH_CRL
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My router changes still use ROUTER_MUTUAL_TLS_AUTH_CA
since the operator still needs to supply the CA bundle, so using with (env "ROUTER_MUTUAL_TRLS_AUTH_CA")
by itself to select the old behavior isn't specific enough.
What do you think of something like this?
{{- if $haveClientCA }} ca-file /var/lib/haproxy/mtls/latest/ca-bundle.pem {{ else }} ca-file /etc/ssl/certs/ca-bundle.trust.crt {{ end }} | |
{{- if $haveCRLs }} crl-file /var/lib/haproxy/mtls/latest/crls.pem {{ end }} | |
{{- with (env "ROUTER_MUTUAL_TLS_AUTH_CRL") }} crl-file {{ . }} | |
{{- with (env "ROUTER_MUTUAL_TLS_AUTH_CA") }} ca-file {{ . }} {{- else }} ca-file /etc/ssl/certs/ca-bundle.trust.crt {{ end }} | |
{{- else }} | |
{{- if $haveClientCA }} ca-file /var/lib/haproxy/mtls/latest/ca-bundle.pem {{- else }} ca-file /etc/ssl/certs/ca-bundle.trust.crt {{ end }} | |
{{- if $haveCRLs }} crl-file /var/lib/haproxy/mtls/latest/crls.pem {{ end }} | |
{{- end}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could work (though I think you also need to re-add {{- with (env "ROUTER_MUTUAL_TLS_AUTH_CRL") }} crl-file {{. }} {{ end }}
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix I proposed does include crl-file ...
in that case, but maybe it's a bit confusing. I'll see if I can make it clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've included a clearer version of this fix in #472.
Closing in favor of #472, which includes this fix as well as a fix for OCPBUGS-9464. |
@rfredette: This pull request references Jira Issue OCPBUGS-6661. The bug has been updated to no longer refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR makes the router parse the certificates supplied in
ROUTER_MUTUAL_TLS_AUTH_CA
, download CRLs from any CRL distribution points that are found, and write them to/crls/crls.pem
. That file is supplied to haproxy to use for mTLS client auth.This PR requires that the router deployment include an
EmptyDir
mount for the/crls
directory