Replies: 5 comments 4 replies
-
Let's state the benefit of doing so: AFAIU we assume replication will take more time than local backup recovery. |
Beta Was this translation helpful? Give feedback.
-
|
It seems, that this will be the main RFC, which unites the replicaset and cluster backup. Currently, it's very unclear to me, what we're doing, way too many questions. Strict overview of motivation?Let's firstly figure out, why do we implement that and what do we want to achieve at the end. We must strictly describe the goals of the RFC (e.g. do users wanna see the Point-In-Time recovery or not, according to the https://jira.vk.team/browse/TNTP-2825 they do) and the guarantees, we give to users. For guarantees, we can check backup tools for other databases:
And please, include the links to the associated github and jira tickets, it's very difficult to find them now. How will the process look for the end user?We must determine, how the backup process will look like for the end user. Is he going to take the tool from SDK, configure it and start the backup? In that case the tool should automatically move the needed files to the configured servers. Then a user just calls the tool one more time and it restores the cluster from a backup? Or do we expect user to call some vshard/aeon function, that will return, which files should be copied and from which server, user manually goes to every instance, copies files to some servers and then uses tool to restore the cluster? Or is it going TCM or/and ATE? From the first glance, it loooks like we need all. API of replicaset/clusterThen we should define, how API of the replicaset/cluster will look like, this will be called by a user or our tool. Will we use Will writing the metadata (e.g. timestamp, instance info) of the backup to a file be a separate API? Or Will we have Review
It may happen in VShard, if it's done without any protections:
For that we could use already existing
It's not possible in VShard now and I'm not sure it's possible to implement that at all, while preserving the safety. If that's needed, we'll have to write careful RFC for that to investigate. |
Beta Was this translation helpful? Give feedback.
-
Improved this part, hopefully addressed all the questions.
The high level API is outside of this RFC goals. Here we only describe Tarantool API that can be used for backup/restore by backup agent. At this point it is not clear why we should add extra API to fetch instance config. It is already should be known due to config in Tarantool 3 or one can call
Thanx for suggestion! This part was abstract so added concrete steps for vshard (using |
Beta Was this translation helpful? Give feedback.
-
Is it necessary? Can't we recreate a replicaset with different UUIDs, maybe even with a different replication factor? |
Beta Was this translation helpful? Give feedback.
-
Do we actually support multimaster setups? Is it possible to configure one with Tarantool 3.0 config? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Reviewers:
Changelog
v204/12/2025: Described steps to backup/restore explicitly. Added technical details to make sure xlogs has only commited data in case of synchronous replicaset. Added multimaster/asynchronous master-replica replicaset backup/restore. Described backup in case of vshard. Made misc changes to improve document structure.v1: Added initial version.Links
Github issue #11729 (which holds further references also) .
Document aims
The purpose of the document is to describe how to backup and restore Tarantool at instance, replicaset and cluster level. We describe API only at instance level (vshard is exception, it is an example of cluster backup), all other steps are should done by backup agent.
Not all described below is how Tarantool currently works and rather is how we plan to make it work in terms of backup/restore.
Use cases
Backup is done to restore after all data is lost.
Other known use cases:
Backup consistency
We do not elaborate here making replicaset/cluster backup consistent in terms it represents state at some moment in global time as there is no such yet. However replica/shards has data at the "moment" of backup start. It may differ from replica to replica and from shard to shard due to network latencies, replica failures, internal events which may delay backup start (see technical details for asynchronous replicaset backup).
Not addressed points of #11729
box.backup.info()API. It seems to be not immediately required for backup.Instance
Memtx backup
When
box.backup.start()is called WAL is rotated and function returns a list of files required to restore to the current point. It is last snapshot and all WAL files after it up to rotated. Backup agent is supposed to copy the listed files. It may optimize backup storage and copy only new WAL files if there was no new snapshot since the last backup. After files are copied callbox.backup.stop().Example:
So to backup such replicaset we need next steps:
box.backup.start()on instance.Vinyl backup
The same is going on for vinyl backup except the list of data files include
*.vylog,*.indexand*.runfiles.Recovery
To recover an instance one need to put all data files (listed on backup in
box.backup.start()) in working directory of instance before start.Replicaset
There are 2 cases. Synchronous replicaset and not synchronous replicaset, the latter for example is asynchronous replicaset or multimaster replicaset.
Synchronous replicaset
In this case it is enough to backup only master. We cannot have too outdated data in this case. Notion of master can be up-to-date or not. The latter case is when there is new term and new master this this term and this instance does not know it yet and consider itself a master. If master is up-to-date then it holds all the committed data up to now. If master is not up-to-date then the replicaset can hold new committed data but as master can continue to consider itself a master only for election timeout the amount of this data is limited.
So to backup such replicaset we need next steps:
box.info.uuidfor example).To restore such replicaset we need next steps:
Instance.More technical details on replicaset recovery from single instance backup are in #12039.
Technical details
Without extra precautions the xlogs can have uncommitted transactions. These transactions can be rolled back later in replicaset history but on restore they can be applied. So we may have statements after restore that never be visible in replicaset history. We can avoid that if we wait all uncommitted transactions that get into backup xlog to be committed. If they get rolled back then
box.backup.start()should raise error. There should be special error code, so the client can retry starting backup on this error as error is transient.Non synchronous replicaset
This can be multimaster replicaset and asynchronous master-replica replicaset. In both cases making backup of only a single instance from replicaset as described above can miss some data. For example, in case of multimaster the replication can be paused due to long standing conflict, so instances can have different statements. If we backup only one of the instances we miss statements from the other that are not replicated. As conflict can exist for a long period of time we can miss data in backups for this period.
So to backup such replicaset we need next steps:
box.backup.start({mode='replicaset'}). Backup start will returnvclock_startandvclock_endin this mode. Backup agent should check that intervals of all replicas are overlapped for each vclock component. In this case there will be no rebootstrap after restore. In case the condition is not met, the backup should be restarted (box.backup.stop()/box.backup.start({mode='replicaset'})).Example:
To restore such replicaset we need next steps:
Technical details
In backup mode
'replicaset'we list all extra xlogs the other replicas need to connect without rebootstrap, besides last snapshot and xlogs after it.There still a chance that rebootstrap will be required. This can happen due to race. We make backup of instance
A, then we make backup of instanceB. Before thatBadvances gc vclock forA. So backup of instanceBcan miss some statements required forA. We can deal with that by inspecting vclock intervals present inbox.backup.start()output. We addvclock_startandvclock_endto thebox.backup.start()output inmode='replicaset'. There will be no rebootstrap if intervals of all replicas are overlapped for each vclock component. This check should be done by backup agent.Cluster
Mere backup of every replicaset in cluster without extra coordination may be inconsistent for 2 reasons.
We can take full cluster write lock during backup to exclude both cases but this way backup may impact cluster performance significantly. At the replicaset level backup is lightweight.
Another approach can handle issue 1 but not 2. We can abort/finish in progress data migrations and disable new ones before starting replicasets backup. After it is started data migrations are enabled again. This can be done fast and does not reduce cluster performance. As to issue 2 we can only rely on application in the latter approach, that the application can restore consistently by itself somehow.
vshard
In case of vshard we can use
vshard.router.map_callrw()to start backup on every shard. This way all in progress rebalancing will be finished before starting backup. vshard consists of synchronous replicasets, so we need synchronous replicaset backup (as described in section above) for every shard.So to backup vshard cluster we need next steps:
vshard.router.map_callrw()with function ``box.backup.start()`. Make each shard backup as described in section for synchronous replicaset backup.To restore cluster we need next steps:
Beta Was this translation helpful? Give feedback.
All reactions