Module: `garden`

Purpose

What?

This module provides chaostoolkit actions to simulate zone outages and disrupt pods in Gardener-managed clusters. It supports:

Compute: Termination or hard restart/reboot of nodes in one zone with a min/max lifetime (e.g. 0s-0s to shoot down any machine right when it tries to come up or e.g. 10-60s to let them come up at least for 10s but shoot them down at the latest after 60s).
Network: Blocking only ingress or only egress or all network traffic for nodes in one zone.
Pods: Termination of control plane pods (depends on your access permissions - end users have no access), system component pods (Gardener-managed addons in your kube-system namespace), or pods in general in one zone with a min/max lifetime (e.g. 0s-0s to shoot down any pod right when it tries to come up or e.g. 10-60s to let them come up at least for 10s but shoot them down at the latest after 60s) with or without a grace period.

⚠️ If you block network traffic one way, e.g. ingress (resp. egress), the other way, then egress (resp. ingress), is fully opened, so use with care.

You can run the above in parallel, even of the same type, as long as the targeted zones differ. This way you can also test whether you recover after a multi-zonal outage.

This module also provides chaostoolkit probes:

Health Probe: Probes various Gardener-managed cluster functions in parallel. See k8s for details.

How?

Compute and Network: See cloud provider specific docs.
Pods: Based on the given zone and filters, pods are identified busily/continuously and terminated with or without a grace period. You may provide a min/max lifetime to make the process more random, chaotic, and unpredictable, which may further help you unearth issues.
Health Probe: Deploys probes into the cluster that busily/continuously probe various Gardener-managed cluster functions in parallel. This operation must be rolled back when completed.

Why?

Developing highly available workload that can tolerate a zone outage is no trivial task. You can find more information on how to achieve this goal here. To put your solution to the test, this module will help you.

The probe on the other hand is targeting Gardener developers and output-qualification and puts Gardener HA as such to the test, which requires automation as Gardener-managed clusters perform many functions in parallel.

Usage

Actions and Rollbacks

chaostoolkit introduces so-called actions that can be composed into experiments that perform operations against a system (here a Gardener-managed Kubernetes cluster). The following actions (and explicit rollbacks) are supported:

Module: chaosgarden.garden.actions

assess_cloud_provider_filters_impact: Show which machines/networks would be affected by the given zone and filters. Useful in combination with wait-for before launching the actual action.
run_cloud_provider_compute_failure_simulation: Run compute failure simulation.
run_cloud_provider_compute_failure_simulation_in_background: Same as above, but running in background as a thread. Normally not used with experiments, but directly in Python (scripts).
run_cloud_provider_network_failure_simulation: Run network failure simulation.
rollback_cloud_provider_network_failure_simulation: Rollback network failure simulation explicitly (usually performed automatically above, but can also be invoked explicitly as rollback step in an experiment to deal with interruptions).
run_cloud_provider_network_failure_simulation_in_background: Same as above, but running in background as a thread. Normally not used with experiments, but directly in Python (scripts).
run_control_plane_pod_failure_simulation: Run control plane pod failure simulation (depends on your access permissions - end users have no access).
run_control_plane_pod_failure_simulation_in_background: Same as above, but running in background as a thread. Normally not used with experiments, but directly in Python (scripts).
run_system_components_pod_failure_simulation: Run system component pod failure simulation (Gardener-managed addons in your kube-system namespace).
run_system_components_pod_failure_simulation_in_background: Same as above, but running in background as a thread. Normally not used with experiments, but directly in Python (scripts).
run_general_pod_failure_simulation: Run general pod failure simulation.
run_general_pod_failure_simulation_in_background: Same as above, but running in background as a thread. Normally not used with experiments, but directly in Python (scripts).
run_shoot_cluster_health_probe: Run shoot cluster health probe (usually only interesting to Gardener developers).
rollback_shoot_cluster_health_probe: Rollback shoot cluster health probe explicitly (usually performed automatically above, but can also be invoked explicitly as rollback step in an experiment to deal with interruptions).
run_shoot_cluster_health_probe_in_background: Same as above, but running in background as a thread. Normally not used with experiments, but directly in Python (scripts).

Pod Selectors

The following pod selectors are supported:

pod_node_label_selector, e.g. topology.kubernetes.io/zone=world-1a,worker.gardener.cloud/pool=cpu-worker,..., right-hand side may be a regex, operators are =|==|!=|=~|!~
pod_label_selector, e.g. gardener.cloud/role=controlplane,gardener.cloud/role=vpa,..., regular pod label selector (not interpreted by chaosgarden)
pod_metadata_selector, e.g. namespace=kube-system,name=kube-apiserver.*,..., right-hand side may be a regex, operators are =|==|!=|=~|!~
pod_owner_selector, e.g. kind!=DaemonSet,name=kube-apiserver.*,..., right-hand side may be a regex, operators are =|==|!=|=~|!~

Configuration

The following configuration fields are mandatory:

project: Gardener project name
shoot: Shoot cluster name

Secrets

The following secret field is optional:

kubeconfig_path: Path to kubeconfig file with Garden cluster configuration and credentials

You can omit this field if $KUBECONFIG points to your kubeconfig file (default).

Examples

Assess Filters Impact
Run Compute Failure Simulation
Run Network Failure Simulation
Run Control Plane Pod Failure Simulation
Run System Components Pod Failure Simulation
Run General Pod Failure Simulation
Run Shoot Cluster Health Probe as Hypothesis (doesn't really fit as it must run in background, which is not supported by chaostoolkit)
Run Shoot Cluster Health Probe as Method (the better alternative and almost identical in chaostoolkit behavior)
Explicit Garden Secrets (if you do not want to use $KUBECONFIG)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module: `garden`

Purpose

What?

How?

Why?

Usage

Actions and Rollbacks

Pod Selectors

Configuration

Secrets

Examples

FilesExpand file tree

readme.md

Latest commit

History

readme.md

File metadata and controls

Module: garden

Purpose

What?

How?

Why?

Usage

Actions and Rollbacks

Pod Selectors

Configuration

Secrets

Examples

Module: `garden`