Skip to content

Commit d9e3eb0

Browse files
authored
Merge pull request #1104 from amazeeio/relaxed-scheduling
Introduce relaxed backup job scheduling feature
2 parents d01fae8 + db345f8 commit d9e3eb0

File tree

13 files changed

+355
-39
lines changed

13 files changed

+355
-39
lines changed

cmd/operator/main.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ var (
8383
&cli.StringFlag{Destination: &cfg.Config.PodExecRoleName, Name: "podexecrolename", EnvVars: []string{"BACKUP_PODEXECROLENAME"}, Value: "pod-executor", Usage: "set the role name that should be used for pod command execution"},
8484

8585
&cli.BoolFlag{Destination: &cfg.Config.EnableLeaderElection, Name: "enable-leader-election", EnvVars: []string{"BACKUP_ENABLE_LEADER_ELECTION"}, Value: true, DefaultText: "enabled", Usage: "enable leader election within the operator Pod"},
86+
&cli.BoolFlag{Destination: &cfg.Config.EnableRelaxedScheduling, Name: "enable-relaxed-scheduling", EnvVars: []string{"BACKUP_ENABLE_RELAXED_SCHEDULING"}, Value: false, DefaultText: "disabled", Usage: "enable relaxed scheduling of backup jobs relying on the Kubernetes scheduler"},
8687
&cli.BoolFlag{Destination: &cfg.Config.SkipWithoutAnnotation, Name: "skip-pvcs-without-annotation", EnvVars: []string{"BACKUP_SKIP_WITHOUT_ANNOTATION"}, Value: false, DefaultText: "disabled", Usage: "skip selecting PVCs that don't have the BACKUP_ANNOTATION"},
8788
&cli.StringFlag{Destination: &cfg.Config.BackupCheckSchedule, Name: "checkschedule", EnvVars: []string{"BACKUP_CHECKSCHEDULE"}, Value: "0 0 * * 0", Usage: "the default check schedule"},
8889
&cli.StringFlag{Destination: &cfg.Config.OperatorNamespace, Name: "operator-namespace", EnvVars: []string{"BACKUP_OPERATOR_NAMESPACE"}, Required: true, Usage: "set the namespace in which the K8up operator itself runs"},

docs/modules/ROOT/examples/usage/k8up.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
NAME:
2-
k8up - A new cli application
2+
k8up - Kubernetes and OpenShift Backup Operator
33

44
USAGE:
55
k8up [global options] command [command options] [arguments...]
@@ -19,4 +19,4 @@ GLOBAL OPTIONS:
1919
--version, -v print the version (default: false)
2020

2121
COPYRIGHT:
22-
(c) 2021 VSHN AG
22+
(c) K8up Authors

docs/modules/ROOT/examples/usage/operator.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,10 @@ OPTIONS:
4747
--podexecaccountname value, --serviceaccount value set the service account name that should be used for the pod command execution (default: "pod-executor") [$BACKUP_PODEXECACCOUNTNAME]
4848
--podexecrolename value set the role name that should be used for pod command execution (default: "pod-executor") [$BACKUP_PODEXECROLENAME]
4949
--enable-leader-election enable leader election within the operator Pod (default: enabled) [$BACKUP_ENABLE_LEADER_ELECTION]
50+
--enable-relaxed-scheduling enable relaxed scheduling of backup jobs relying on the Kubernetes scheduler (default: disabled) [$BACKUP_ENABLE_RELAXED_SCHEDULING]
5051
--skip-pvcs-without-annotation skip selecting PVCs that don't have the BACKUP_ANNOTATION (default: disabled) [$BACKUP_SKIP_WITHOUT_ANNOTATION]
5152
--checkschedule value the default check schedule (default: "0 0 * * 0") [$BACKUP_CHECKSCHEDULE]
5253
--operator-namespace value set the namespace in which the K8up operator itself runs [$BACKUP_OPERATOR_NAMESPACE]
54+
--insecure-allow-podexec-spdy-fallback enable fallback to SPDY connections for data streaming used by application aware backups. Might need to be enabled if the cluster has Kubernetes version 1.30 or lower. K8up uses WebSockets by default. CAUTION: Has been observed to cause silent data corruption in some network setups, use at own risk! (default: false) [$INSECURE_ALLOW_PODEXEC_SPDY_FALLBACK]
5355
--vardir value the var data dir for read/write k8up data or temp file in the backup pod (default: "/k8up") [$VAR_DIR]
5456
--help, -h show help (default: false)

docs/modules/ROOT/examples/usage/restic.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ OPTIONS:
1818
--fileExtensionAnnotation value Defines the file extension to use for STDOUT backups. [$FILEEXTENSION_ANNOTATION]
1919
--backucontainerannotation value set the annotation name that specify the backup container inside the Pod (default: "k8up.io/backupcommand-container") [$BACKUP_CONTAINERANNOTATION]
2020
--skipPreBackup If the job should skip the backup command and only backup volumes. (default: false) [$SKIP_PREBACKUP]
21+
--skipSnapshotSync If set, skip synchronizing Snapshot custom resources to the cluster after backup or prune operations. Webhook notifications are still sent. (default: false) [$BACKUP_SKIP_SNAPSHOT_SYNC]
2122
--promURL value Sets the URL of a prometheus push gateway to report metrics. [$PROM_URL]
2223
--clusterName value Sets the Kubernetes cluster name for grouping metrics in push gateway [$CLUSTER_NAME]
2324
--webhookURL value, --statsURL value Sets the URL of a server which will retrieve a webhook after the action completes. [$STATS_URL]
@@ -69,4 +70,5 @@ OPTIONS:
6970
--caCert value The certificate authority file path [$CA_CERT_FILE]
7071
--clientCert value The client certificate file path [$CLIENT_CERT_FILE]
7172
--clientKey value The client private key file path [$CLIENT_KEY_FILE]
73+
--insecure-allow-podexec-spdy-fallback enable fallback to SPDY connections for data streaming used by application aware backups. Might need to be enabled if the cluster has Kubernetes version 1.30 or lower. K8up uses WebSockets by default. CAUTION: Has been observed to cause silent data corruption in some network setups, use at own risk! (default: false) [$INSECURE_ALLOW_PODEXEC_SPDY_FALLBACK]
7274
--help, -h show help (default: false)

docs/modules/ROOT/nav.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
* xref:explanations/system-requirements.adoc[System Requirements]
3939
* xref:explanations/what-has-changed-in-v1.adoc[Changes in K8up v1.0]
4040
* xref:explanations/what-has-changed-in-v2.adoc[Changes in K8up v2.0]
41-
* xref:explanations/rwo.adoc[]
41+
* xref:explanations/backup-pod-scheduling.adoc[Backup Pod Scheduling]
4242
4343
.About
4444
* xref:about/roadmap.adoc[Roadmap]
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
= Backup Pod Scheduling
2+
3+
K8up has two scheduling modes for backup pods. Both scheduling modes respect the `k8up.io/hostname` annotation on a PVC (see: xref:references/annotations.adoc[Annotations]).
4+
5+
== Classic Scheduling (default)
6+
7+
K8s does not prevent mounting an RWO PVC to multiple pods, if they are scheduled on the same host.
8+
K8up uses this fact to provide the ability to back up RWO PVCs.
9+
10+
For a given backup in a namespace K8up will list all the PVCs.
11+
The PVCs are then grouped depending on their type:
12+
13+
* all RWX PVCs are grouped together
14+
* RWO PVCs are grouped by k8s node where they are currently mounted
15+
16+
K8up will then deploy backup jobs according to the grouping, a single job for all RWX PVCs and a job for each K8s node.
17+
The jobs themselves loop over the mounted PVCs and do a file backup via restic.
18+
19+
== Relaxed Scheduling
20+
21+
In the relaxed scheduling mode, the backup pod scheduling is handled by the Kubernetes scheduler. This mode is intended for the use with storage provisioners that handle node affinity and topology well for their provisioned volumes.
22+
23+
For a given backup in a namespace K8up will list all the PVCs and schedule one backup job for each PVC. K8up will not set a node selector for the backup pods unless explicitly requested through the `k8up.io/hostname` annotation on a PVC. This allows the Kubernetes scheduler to more freely select an appropriate node to run the backup on.
24+
25+
Relaxed scheduling can be enabled via the operator flag `--enable-relaxed-scheduling` or environment variable `BACKUP_ENABLE_RELAXED_SCHEDULING` (see: xref:references/operator-config-reference.adoc[Operator Configuration]).

docs/modules/ROOT/pages/explanations/backup.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ It does not work in some cases though.
2121
More precise, it does not work when files are kept open for a long period of time, like databases do.
2222
It does also not work for content that is not stored in the cluster, like a managed database that is offered by a service provider.
2323

24-
NOTE: If the PVC has the `RWO` access mode, the backup `Pod` needs to be scheduled onto the same node, on which the `Pod` (which uses the respective `PVC`) runs.
24+
NOTE: If the PVC has the `RWO` access mode, the backup `Pod` needs to be scheduled onto the same node, on which the `Pod` (which uses the respective `PVC`) runs. This happens explicitly or implicitly depending on the backup pod scheduling mode (see: xref:explanations/backup-pod-scheduling.adoc[Backup Pod Scheduling]).
2525

26-
Read xref:references/annotations.adoc[] to learn more about how the backup process can be influenced.
26+
Read xref:references/annotations.adoc[] to learn more about how the backup process can be influenced and xref:explanations/backup-pod-scheduling.adoc[Backup Pod Scheduling] to learn more about the two backup pod scheduling modes.
2727

2828
== Application-Aware Backups
2929

docs/modules/ROOT/pages/explanations/rwo.adoc

Lines changed: 0 additions & 13 deletions
This file was deleted.

docs/modules/ROOT/pages/references/annotations.adoc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,4 +36,10 @@ See xref:references/operator-config-reference.adoc[Operator Configuration refere
3636
|A string which is valid pod name.
3737
|`Pod`
3838
|`BACKUP_CONTAINERANNOTATION`
39+
40+
|`k8up.io/hostname`
41+
|If defined, the backup pod will be scheduled on this node
42+
|A string which is a valid node name.
43+
|`PersistentVolumeClaim`
44+
|n/a
3945
|===

docs/modules/ROOT/pages/references/api-reference.adoc

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -246,13 +246,13 @@ This is for advanced use-cases only. Please only set this if you know what you'r
246246
Value must be positive integer if given.
247247
| *`keepJobs`* __integer__ | KeepJobs amount of jobs to keep for later analysis.
248248

249-
250249
Deprecated: Use FailedJobsHistoryLimit and SuccessfulJobsHistoryLimit respectively.
251250
| *`failedJobsHistoryLimit`* __integer__ | FailedJobsHistoryLimit amount of failed jobs to keep for later analysis.
252251
KeepJobs is used property is not specified.
253252
| *`successfulJobsHistoryLimit`* __integer__ | SuccessfulJobsHistoryLimit amount of successful jobs to keep for later analysis.
254253
KeepJobs is used property is not specified.
255254
| *`promURL`* __string__ | PromURL sets a prometheus push URL where the backup container send metrics to
255+
| *`clusterName`* __string__ | ClusterName sets the kubernetes cluster name to send to pushgateway for grouping metrics
256256
| *`statsURL`* __string__ | StatsURL sets an arbitrary URL where the restic container posts metrics and
257257
information about the snapshots to. This is in addition to the prometheus
258258
pushgateway.
@@ -290,13 +290,13 @@ This is for advanced use-cases only. Please only set this if you know what you'r
290290
Value must be positive integer if given.
291291
| *`keepJobs`* __integer__ | KeepJobs amount of jobs to keep for later analysis.
292292

293-
294293
Deprecated: Use FailedJobsHistoryLimit and SuccessfulJobsHistoryLimit respectively.
295294
| *`failedJobsHistoryLimit`* __integer__ | FailedJobsHistoryLimit amount of failed jobs to keep for later analysis.
296295
KeepJobs is used property is not specified.
297296
| *`successfulJobsHistoryLimit`* __integer__ | SuccessfulJobsHistoryLimit amount of successful jobs to keep for later analysis.
298297
KeepJobs is used property is not specified.
299298
| *`promURL`* __string__ | PromURL sets a prometheus push URL where the backup container send metrics to
299+
| *`clusterName`* __string__ | ClusterName sets the kubernetes cluster name to send to pushgateway for grouping metrics
300300
| *`statsURL`* __string__ | StatsURL sets an arbitrary URL where the restic container posts metrics and
301301
information about the snapshots to. This is in addition to the prometheus
302302
pushgateway.
@@ -371,9 +371,9 @@ This is for advanced use-cases only. Please only set this if you know what you'r
371371
| *`activeDeadlineSeconds`* __integer__ | ActiveDeadlineSeconds specifies the duration in seconds relative to the startTime that the job may be continuously active before the system tries to terminate it.
372372
Value must be positive integer if given.
373373
| *`promURL`* __string__ | PromURL sets a prometheus push URL where the backup container send metrics to
374+
| *`clusterName`* __string__ | ClusterName sets the kubernetes cluster name to send to pushgateway for grouping metrics
374375
| *`keepJobs`* __integer__ | KeepJobs amount of jobs to keep for later analysis.
375376

376-
377377
Deprecated: Use FailedJobsHistoryLimit and SuccessfulJobsHistoryLimit respectively.
378378
| *`failedJobsHistoryLimit`* __integer__ | FailedJobsHistoryLimit amount of failed jobs to keep for later analysis.
379379
KeepJobs is used property is not specified.
@@ -408,9 +408,9 @@ This is for advanced use-cases only. Please only set this if you know what you'r
408408
| *`activeDeadlineSeconds`* __integer__ | ActiveDeadlineSeconds specifies the duration in seconds relative to the startTime that the job may be continuously active before the system tries to terminate it.
409409
Value must be positive integer if given.
410410
| *`promURL`* __string__ | PromURL sets a prometheus push URL where the backup container send metrics to
411+
| *`clusterName`* __string__ | ClusterName sets the kubernetes cluster name to send to pushgateway for grouping metrics
411412
| *`keepJobs`* __integer__ | KeepJobs amount of jobs to keep for later analysis.
412413

413-
414414
Deprecated: Use FailedJobsHistoryLimit and SuccessfulJobsHistoryLimit respectively.
415415
| *`failedJobsHistoryLimit`* __integer__ | FailedJobsHistoryLimit amount of failed jobs to keep for later analysis.
416416
KeepJobs is used property is not specified.
@@ -738,7 +738,6 @@ Value must be positive integer if given.
738738
| *`retention`* __xref:{anchor_prefix}-github.1485827954.workers.dev-k8up-io-k8up-v2-api-v1-retentionpolicy[$$RetentionPolicy$$]__ | Retention sets how many backups should be kept after a forget and prune
739739
| *`keepJobs`* __integer__ | KeepJobs amount of jobs to keep for later analysis.
740740

741-
742741
Deprecated: Use FailedJobsHistoryLimit and SuccessfulJobsHistoryLimit respectively.
743742
| *`failedJobsHistoryLimit`* __integer__ | FailedJobsHistoryLimit amount of failed jobs to keep for later analysis.
744743
KeepJobs is used property is not specified.
@@ -775,7 +774,6 @@ Value must be positive integer if given.
775774
| *`retention`* __xref:{anchor_prefix}-github.1485827954.workers.dev-k8up-io-k8up-v2-api-v1-retentionpolicy[$$RetentionPolicy$$]__ | Retention sets how many backups should be kept after a forget and prune
776775
| *`keepJobs`* __integer__ | KeepJobs amount of jobs to keep for later analysis.
777776

778-
779777
Deprecated: Use FailedJobsHistoryLimit and SuccessfulJobsHistoryLimit respectively.
780778
| *`failedJobsHistoryLimit`* __integer__ | FailedJobsHistoryLimit amount of failed jobs to keep for later analysis.
781779
KeepJobs is used property is not specified.
@@ -889,16 +887,19 @@ This is for advanced use-cases only. Please only set this if you know what you'r
889887
Value must be positive integer if given.
890888
| *`restoreMethod`* __xref:{anchor_prefix}-github.1485827954.workers.dev-k8up-io-k8up-v2-api-v1-restoremethod[$$RestoreMethod$$]__ |
891889
| *`restoreFilter`* __string__ |
890+
| *`restoreTimeFilter`* __string__ | Simple filter to define a timestamp (prefix, YYYY-MM-DD hh:mm:ss) for snapshot selection instead of latest (or latest if nothing matches)
892891
| *`snapshot`* __string__ |
893892
| *`keepJobs`* __integer__ | KeepJobs amount of jobs to keep for later analysis.
894893

895-
896894
Deprecated: Use FailedJobsHistoryLimit and SuccessfulJobsHistoryLimit respectively.
897895
| *`failedJobsHistoryLimit`* __integer__ | FailedJobsHistoryLimit amount of failed jobs to keep for later analysis.
898896
KeepJobs is used property is not specified.
899897
| *`successfulJobsHistoryLimit`* __integer__ | SuccessfulJobsHistoryLimit amount of successful jobs to keep for later analysis.
900898
KeepJobs is used property is not specified.
901899
| *`tags`* __string array__ | Tags is a list of arbitrary tags that get added to the backup via Restic's tagging system
900+
| *`paths`* __string array__ | Paths is a list of paths that are contained with in a snapshot and can be filtered by
901+
| *`delete`* __boolean__ | Delete ensures the state after restoring a snapshot is identical to the snapshot
902+
Deletes files from target if they do not exist in snapshot
902903
|===
903904

904905

@@ -931,16 +932,19 @@ This is for advanced use-cases only. Please only set this if you know what you'r
931932
Value must be positive integer if given.
932933
| *`restoreMethod`* __xref:{anchor_prefix}-github.1485827954.workers.dev-k8up-io-k8up-v2-api-v1-restoremethod[$$RestoreMethod$$]__ |
933934
| *`restoreFilter`* __string__ |
935+
| *`restoreTimeFilter`* __string__ | Simple filter to define a timestamp (prefix, YYYY-MM-DD hh:mm:ss) for snapshot selection instead of latest (or latest if nothing matches)
934936
| *`snapshot`* __string__ |
935937
| *`keepJobs`* __integer__ | KeepJobs amount of jobs to keep for later analysis.
936938

937-
938939
Deprecated: Use FailedJobsHistoryLimit and SuccessfulJobsHistoryLimit respectively.
939940
| *`failedJobsHistoryLimit`* __integer__ | FailedJobsHistoryLimit amount of failed jobs to keep for later analysis.
940941
KeepJobs is used property is not specified.
941942
| *`successfulJobsHistoryLimit`* __integer__ | SuccessfulJobsHistoryLimit amount of successful jobs to keep for later analysis.
942943
KeepJobs is used property is not specified.
943944
| *`tags`* __string array__ | Tags is a list of arbitrary tags that get added to the backup via Restic's tagging system
945+
| *`paths`* __string array__ | Paths is a list of paths that are contained with in a snapshot and can be filtered by
946+
| *`delete`* __boolean__ | Delete ensures the state after restoring a snapshot is identical to the snapshot
947+
Deletes files from target if they do not exist in snapshot
944948
|===
945949

946950

@@ -1153,7 +1157,6 @@ ScheduleSpec defines the schedules for the various job types.
11531157
| *`backend`* __xref:{anchor_prefix}-github.1485827954.workers.dev-k8up-io-k8up-v2-api-v1-backend[$$Backend$$]__ |
11541158
| *`keepJobs`* __integer__ | KeepJobs amount of jobs to keep for later analysis.
11551159

1156-
11571160
Deprecated: Use FailedJobsHistoryLimit and SuccessfulJobsHistoryLimit respectively.
11581161
| *`failedJobsHistoryLimit`* __integer__ | FailedJobsHistoryLimit amount of failed jobs to keep for later analysis.
11591162
KeepJobs is used property is not specified.

0 commit comments

Comments
 (0)