-
Notifications
You must be signed in to change notification settings - Fork 124
Reconcile VDS instances on lifetimeWatcher done events #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconcile VDS instances on lifetimeWatcher done events #665
Conversation
3fc9613
to
38e20fd
Compare
In the case where a lifetimeWatcher fails to renew the Vault client lease, we want all related VDS instances to be synced. This helps to mitigate the issue where an external revocation of the client token causes the issued secret leases to be revoked. In that case VSO would have no idea that the token has been revoked. The ideal TTL for the client token should be relatively short, e.g. 1m, so as to trigger the lifetimeWatcher earlier. In the future, VSO will be able to subscribe to lease revocation events from Vault. In that case, VSO will be able to perform the sync immediately.
38e20fd
to
e1b3b26
Compare
The Client ID is the hash of the Vault secret's accessor. - VDS: trigger sync on Vault Client ID changes
044b340
to
1cdae92
Compare
- remove the called back client from the factory's cache to ensure that any of its clones are purged. The next call to Sync() will get a new client back - fix bogus lock handling in the factory's Get()
Have you considered using a GenericEvent source channel in the controller's watches to enqueue reconciles? Looks like it could simplify the sync controller portion of this. There's a slightly outdated example here. |
I had considered its use but it did not offer an easy way to specify a sync delay. We may be able to add a random sleep or something before enqueuing the object. I can put up a spike PR to see if its use fits well here. |
Switched to use GenericEvents in #704 |
Update the VaultDynamicSecretReconciler to rely on the controller-runtime's source.Channel that is watched as a raw source. This ensures that there is only one reconciler work queue required to handle external reconciliation requests.
Previously, the client factory would restore all stored clients to warm its client cache. Some issues with that: - restore all is blocking and can lead to a failure of the VSO pod to become ready if there a large number of clients to be restored - restore all is unnecessary, since the client factory will always attempt a restoration upon resource reconciliation. All resources are reconciled when VSO starts up.
It is nice to have k8s cluster metrics during integration testing.
In the case where a lifetimeWatcher fails to renew the Vault client lease, we want all related VDS instances to be synced. This helps to mitigate the issue where an external revocation of the client token causes the issued secret leases to be revoked. In that case VSO would have no idea that the token has been revoked. This fix should also work in the case where the token has a max TTL set.
The ideal TTL for the client token should be relatively short, e.g. 1m, so as to trigger the lifetimeWatcher earlier.
In the future, VSO will be able to subscribe to lease revocation events from Vault. In that case, VSO will be able to perform the sync immediately.
This PR adds a new source.Channel watcher for handling external reconciliation requests. These are triggered by the Vault Client's token having expired or whenever a token lease renewal has failed.
The
CachingClientFactory
has been updated to handle when a Client'sLifetimeWatcher
is done. In which case, the factory will remove the Vault Client from the its cache, and call any of the registeredClientCallback
functions on behalf of the VDS reconciler.This fix should also allow for using non-periodic Vault client tokens, although that is not a recommended practice as all of the other Vault secret reconcilers have this this new callback feature.