We had a couple of incident that when the etcd leader node is restated that the Kubernetes garbage collector is deleting all Faros managed resources.
After analysis, root cause is probably the following:
https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#owners-and-dependents
Note: Cross-namespace owner references is disallowed by design. This means:
1) Namespace-scoped dependents can only specify owners in the same namespace, and owners that are cluster-scoped.
2) Cluster-scoped dependents can only specify cluster-scoped owners, but not namespace-scoped owners.
https://github.com/kubernetes/apimachinery/blob/master/pkg/apis/meta/v1/types.go#L311
Currently GitTrack is namespace-scoped. This means that all ClusterGitTrackObject and GitTrackObject in other namespace than GitTrack have an illegal ownerreference currently.
To solve this, GitTrack should become cluster-scoped.
Details:
- Kubernetes version: 1.11.9
- kops version: 1.11.1
- etcd version: 3.3.10
- HA cluster with 3 masters
- single GitTrack in faros-system namespace
- lot's of resources in cluster scope and different namespace
Trigger:
- terminate leader etcd VM in AWS console
After some time, following logs will appear in the kube-controller-manager:
I0607 09:17:18.175766 1 controller_utils.go:1032] Caches are synced for garbage collector controller
I0607 09:17:18.175785 1 garbagecollector.go:142] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
I0607 09:17:18.188106 1 controller_utils.go:1032] Caches are synced for garbage collector controller
I0607 09:17:18.188124 1 garbagecollector.go:245] synced garbage collector
I0607 09:17:18.188147 1 garbagecollector.go:408] processing item [faros.pusher.com/v1alpha1/GitTrackObject, namespace: platform-system, name: serviceaccount-kube-state-metrics, uid: 68d5610b-8240-11e9-bd80-12537198d31e]
I0607 09:17:19.192572 1 garbagecollector.go:521] delete object [faros.pusher.com/v1alpha1/GitTrackObject, namespace: platform-system, name: serviceaccount-kube-state-metrics, uid: 68d5610b-8240-11e9-bd80-12537198d31e] with propagation policy Background
...
Only GitTrackObject in same namespace as GitTrack are not deleted.
We had a couple of incident that when the etcd leader node is restated that the Kubernetes garbage collector is deleting all Faros managed resources.
After analysis, root cause is probably the following:
https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#owners-and-dependents
https://github.com/kubernetes/apimachinery/blob/master/pkg/apis/meta/v1/types.go#L311
Currently GitTrack is namespace-scoped. This means that all ClusterGitTrackObject and GitTrackObject in other namespace than GitTrack have an illegal ownerreference currently.
To solve this, GitTrack should become cluster-scoped.
Details:
Trigger:
After some time, following logs will appear in the kube-controller-manager:
Only GitTrackObject in same namespace as GitTrack are not deleted.