Skip to content
This repository was archived by the owner on Oct 13, 2021. It is now read-only.
This repository was archived by the owner on Oct 13, 2021. It is now read-only.

All resources are deleted on etcd leader loss #143

@wimdec

Description

@wimdec

We had a couple of incident that when the etcd leader node is restated that the Kubernetes garbage collector is deleting all Faros managed resources.

After analysis, root cause is probably the following:
https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#owners-and-dependents

Note: Cross-namespace owner references is disallowed by design. This means: 
1) Namespace-scoped dependents can only specify owners in the same namespace, and owners that are cluster-scoped. 
2) Cluster-scoped dependents can only specify cluster-scoped owners, but not namespace-scoped owners.

https://github.com/kubernetes/apimachinery/blob/master/pkg/apis/meta/v1/types.go#L311

Currently GitTrack is namespace-scoped. This means that all ClusterGitTrackObject and GitTrackObject in other namespace than GitTrack have an illegal ownerreference currently.

To solve this, GitTrack should become cluster-scoped.

Details:

  • Kubernetes version: 1.11.9
  • kops version: 1.11.1
  • etcd version: 3.3.10
  • HA cluster with 3 masters
  • single GitTrack in faros-system namespace
  • lot's of resources in cluster scope and different namespace

Trigger:

  • terminate leader etcd VM in AWS console

After some time, following logs will appear in the kube-controller-manager:

I0607 09:17:18.175766       1 controller_utils.go:1032] Caches are synced for garbage collector controller
I0607 09:17:18.175785       1 garbagecollector.go:142] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
I0607 09:17:18.188106       1 controller_utils.go:1032] Caches are synced for garbage collector controller
I0607 09:17:18.188124       1 garbagecollector.go:245] synced garbage collector
I0607 09:17:18.188147       1 garbagecollector.go:408] processing item [faros.pusher.com/v1alpha1/GitTrackObject, namespace: platform-system, name: serviceaccount-kube-state-metrics, uid: 68d5610b-8240-11e9-bd80-12537198d31e]
I0607 09:17:19.192572       1 garbagecollector.go:521] delete object [faros.pusher.com/v1alpha1/GitTrackObject, namespace: platform-system, name: serviceaccount-kube-state-metrics, uid: 68d5610b-8240-11e9-bd80-12537198d31e] with propagation policy Background
...

Only GitTrackObject in same namespace as GitTrack are not deleted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ProjectbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions