-
Notifications
You must be signed in to change notification settings - Fork 124
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The VSO sometimes loses track of vault-pki-secret certs. It will say its in the renewal window, but wont actually rotate the cert unless it is restarted. Once restarted, it rotates the cert successfully.
To Reproduce
Steps to reproduce the behavior:
- Deploy VSO and a vault-pki-secret resource
- Wait for cert to expire
- Notice cert was not rotated
- Restart VSO
- Cert will be rotated
Application deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: '4'
creationTimestamp: '2024-10-16T19:19:42Z'
generation: 5
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: sf-vault-secrets-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: vault-secrets-operator
app.kubernetes.io/version: 0.8.1
argocd.argoproj.io/instance: vault-prod_sf-vault-secrets-operator
control-plane: controller-manager
helm.sh/chart: vault-secrets-operator-0.8.6
name: sf-vault-secrets-operator-controller-manager
namespace: vault-prod
resourceVersion: '3696531985'
uid: 30e8be8f-99e7-46e1-b927-f97772239224
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: sf-vault-secrets-operator
app.kubernetes.io/name: vault-secrets-operator
control-plane: controller-manager
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
kubectl.kubernetes.io/default-container: manager
creationTimestamp: null
labels:
app.kubernetes.io/instance: sf-vault-secrets-operator
app.kubernetes.io/name: vault-secrets-operator
control-plane: controller-manager
spec:
containers:
- args:
- '--secure-listen-address=0.0.0.0:8443'
- '--upstream=http://127.0.0.1:8080/'
- '--logtostderr=true'
- '--v=0'
env:
- name: KUBERNETES_CLUSTER_DOMAIN
value: cluster.local
image: my-registry/vault/kube-rbac-proxy:v0.18.1
imagePullPolicy: IfNotPresent
name: kube-rbac-proxy
ports:
- containerPort: 8443
name: https
protocol: TCP
resources:
limits:
memory: 1Gi
requests:
cpu: 5m
memory: 500Mi
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- args:
- '--health-probe-bind-address=:8081'
- '--metrics-bind-address=127.0.0.1:8080'
- '--leader-elect'
- '--global-vault-auth-options=allow-default-globals'
- '--backoff-initial-interval=5s'
- '--backoff-max-interval=60s'
- '--backoff-max-elapsed-time=0s'
- '--backoff-multiplier=1.50'
- '--backoff-randomization-factor=0.50'
- '--zap-log-level=info'
- '--zap-time-encoding=rfc3339'
- '--zap-stacktrace-level=panic'
command:
- /vault-secrets-operator
env:
- name: OPERATOR_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: OPERATOR_POD_UID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
- name: KUBERNETES_CLUSTER_DOMAIN
value: cluster.local
image: my-registry/vault/vault-secrets-operator:0.10.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8081
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 1
name: manager
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: 8081
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 1Gi
requests:
cpu: 10m
memory: 500Mi
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/podinfo
name: podinfo
dnsPolicy: ClusterFirst
nodeSelector:
node-role.kubernetes.io/infrastructure-new: ''
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
runAsNonRoot: true
serviceAccount: sf-vault-secrets-operator-controller-manager
serviceAccountName: sf-vault-secrets-operator-controller-manager
terminationGracePeriodSeconds: 120
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infrastructure-new
operator: Exists
volumes:
- downwardAPI:
defaultMode: 420
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.name
path: name
- fieldRef:
apiVersion: v1
fieldPath: metadata.uid
path: uid
name: podinfo
status:
availableReplicas: 1
conditions:
- lastTransitionTime: '2024-10-16T19:19:42Z'
lastUpdateTime: '2025-03-17T17:01:14Z'
message: >-
ReplicaSet "sf-vault-secrets-operator-controller-manager-5599cb9f75" has
successfully progressed.
reason: NewReplicaSetAvailable
status: 'True'
type: Progressing
- lastTransitionTime: '2025-05-08T14:55:39Z'
lastUpdateTime: '2025-05-08T14:55:39Z'
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: 'True'
type: Available
observedGeneration: 5
readyReplicas: 1
replicas: 1
updatedReplicas: 1
The following are logs from today just before VSO was restarted. It knows the grafana cert is in the renewal window but it will not rotate it. No error logs after this. Cert just doesnt get rotated.
{"level":"info","ts":"2025-05-08T14:55:48Z","msg":"Must sync","controller":"vaultpkisecret","controllerGroup":"secrets.hashicorp.com","controllerKind":"VaultPKISecret","VaultPKISecret":{"name":"grafana","namespace":"grafana"},"namespace":"grafana","name":"grafana","reconcileID":"22c5590e-c366-4f8f-a4a1-83a7d6138e5d","reason":"InRenewalWindow"}
vault-pki-secret resource
kc get vaultpkisecret --context sf-prod -n grafana -o yaml
apiVersion: v1
items:
- apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultPKISecret
metadata:
annotations:
creationTimestamp: "2025-02-06T22:41:25Z"
finalizers:
- vaultpkisecrets.secrets.hashicorp.com/finalizer
generation: 1
labels:
argocd.argoproj.io/instance: grafana_grafana
name: grafana
namespace: grafana
resourceVersion: "3638517376"
uid: f5a3bbbb-72ad-46df-bf14-72044d418cd7
spec:
commonName: <common_name>
destination:
create: true
name: grafana-pki-secret
overwrite: false
transformation: {}
expiryOffset: 3600s
format: pem
mount: pki
role: postgres
rolloutRestartTargets:
- kind: Deployment
name: grafana
ttl: 2592000s
vaultAuthRef: grafana
status:
error: ""
expiration: 1746677241 (Thu May 08 2025 04:07:21 GMT+0000)
lastGeneration: 1
lastRotation: 1744085242 (Tue Apr 08 2025 04:07:22 GMT+0000)
secretMAC: K36WwBka5fODwB41pxiRulmrmxyacg4mSfFwl9zn2tY=
serialNumber: 35:17:fb:18:8a:68:11:c1:4c:f3:9f:71:d4:94:86:f3:d9:60:fe:84
valid: true
kind: List
metadata:
resourceVersion: ""
Expected behavior
When in the renewal window, vso successfully rotates the cert.
Environment
- Kubernetes version: v1.30.9
- Distribution or cloud vendor (OpenShift, EKS, GKE, AKS, etc.): Vanilla on-prem
- vault-secrets-operator version: 0.10.0
bhvishal9
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working