Tarantool backup/restore #12040

nshy · 2025-11-19T12:17:56Z

nshy
Nov 19, 2025
Collaborator

Reviewers:

Changelog

v2 04/12/2025: Described steps to backup/restore explicitly. Added technical details to make sure xlogs has only commited data in case of synchronous replicaset. Added multimaster/asynchronous master-replica replicaset backup/restore. Described backup in case of vshard. Made misc changes to improve document structure.
v1: Added initial version.

Links

Github issue #11729 (which holds further references also) .

Document aims

The purpose of the document is to describe how to backup and restore Tarantool at instance, replicaset and cluster level. We describe API only at instance level (vshard is exception, it is an example of cluster backup), all other steps are should done by backup agent.

Not all described below is how Tarantool currently works and rather is how we plan to make it work in terms of backup/restore.

Use cases

Backup is done to restore after all data is lost.

Other known use cases:

Restore cluster after full data lost when there is standby cluster. If standby cluster if distant, then restoring lost cluster from it will take more time then first restoring from local backup and then copying only difference from standby cluster. The same can be applied to replicaset spread among datacenters when restoring a lost replica.

Backup consistency

We do not elaborate here making replicaset/cluster backup consistent in terms it represents state at some moment in global time as there is no such yet. However replica/shards has data at the "moment" of backup start. It may differ from replica to replica and from shard to shard due to network latencies, replica failures, internal events which may delay backup start (see technical details for asynchronous replicaset backup).

Not addressed points of #11729

box.backup.info() API. It seems to be not immediately required for backup.
PITR (Point in time recovery). There is no requirement yet.

Instance

Memtx backup

When box.backup.start() is called WAL is rotated and function returns a list of files required to restore to the current point. It is last snapshot and all WAL files after it up to rotated. Backup agent is supposed to copy the listed files. It may optimize backup storage and copy only new WAL files if there was no new snapshot since the last backup. After files are copied call box.backup.stop().

Example:

> box.backup.start()
---
- - ./00000000000000001111.snap
  - ./00000000000000001111.xlog
  - ./00000000000000002222.xlog
...

So to backup such replicaset we need next steps:

Call box.backup.start() on instance.
Copy files listed in the above call where required.
Call `box.backup.stop().

Vinyl backup

The same is going on for vinyl backup except the list of data files include *.vylog, *.index and *.run files.

Recovery

To recover an instance one need to put all data files (listed on backup in box.backup.start()) in working directory of instance before start.

Replicaset

There are 2 cases. Synchronous replicaset and not synchronous replicaset, the latter for example is asynchronous replicaset or multimaster replicaset.

Synchronous replicaset

In this case it is enough to backup only master. We cannot have too outdated data in this case. Notion of master can be up-to-date or not. The latter case is when there is new term and new master this this term and this instance does not know it yet and consider itself a master. If master is up-to-date then it holds all the committed data up to now. If master is not up-to-date then the replicaset can hold new committed data but as master can continue to consider itself a master only for election timeout the amount of this data is limited.

So to backup such replicaset we need next steps:

Backup current replicaset master as described in single instance backup.
Store each replica instance UUID in backup metadata (instance UUID is available through box.info.uuid for example).

To restore such replicaset we need next steps:

Copy backup data files in each replica working directory.
Fix instance UUID in each data file header to the UUID saved in backup metadata. Data file header is plain text. One need to replace value for header key Instance.

More technical details on replicaset recovery from single instance backup are in #12039.

Technical details

Without extra precautions the xlogs can have uncommitted transactions. These transactions can be rolled back later in replicaset history but on restore they can be applied. So we may have statements after restore that never be visible in replicaset history. We can avoid that if we wait all uncommitted transactions that get into backup xlog to be committed. If they get rolled back then box.backup.start() should raise error. There should be special error code, so the client can retry starting backup on this error as error is transient.

Non synchronous replicaset

This can be multimaster replicaset and asynchronous master-replica replicaset. In both cases making backup of only a single instance from replicaset as described above can miss some data. For example, in case of multimaster the replication can be paused due to long standing conflict, so instances can have different statements. If we backup only one of the instances we miss statements from the other that are not replicated. As conflict can exist for a long period of time we can miss data in backups for this period.

So to backup such replicaset we need next steps:

Backup every instance of replicaset as described in single instance backup. Start backup with box.backup.start({mode='replicaset'}). Backup start will return vclock_start and vclock_end in this mode. Backup agent should check that intervals of all replicas are overlapped for each vclock component. In this case there will be no rebootstrap after restore. In case the condition is not met, the backup should be restarted (box.backup.stop()/box.backup.start({mode='replicaset'})).

Example:

> box.backup.start()
---
---
- 1: ./00000000000000000777.xlog
  2: ./00000000000000001111.snap
  3: ./00000000000000001111.xlog
  4: ./00000000000000002222.xlog
  vclock_start:
  - 555
  - 222
  vclock_end:
  - 888
  - 1334
...
...

To restore such replicaset we need next steps:

Copy corresponding backup data files for each replica to their working directory.

Technical details

In backup mode 'replicaset' we list all extra xlogs the other replicas need to connect without rebootstrap, besides last snapshot and xlogs after it.

There still a chance that rebootstrap will be required. This can happen due to race. We make backup of instance A, then we make backup of instance B. Before that B advances gc vclock for A. So backup of instance B can miss some statements required for A. We can deal with that by inspecting vclock intervals present in box.backup.start() output. We add vclock_start and vclock_end to the box.backup.start() output in mode='replicaset'. There will be no rebootstrap if intervals of all replicas are overlapped for each vclock component. This check should be done by backup agent.

Cluster

Mere backup of every replicaset in cluster without extra coordination may be inconsistent for 2 reasons.

Rebalancing, when data is moved from one replicaset to another. The data being migrated may be lost or duplicated.
Cross-shards transactions. The changes may be applied in order unexpected to application.

We can take full cluster write lock during backup to exclude both cases but this way backup may impact cluster performance significantly. At the replicaset level backup is lightweight.

Another approach can handle issue 1 but not 2. We can abort/finish in progress data migrations and disable new ones before starting replicasets backup. After it is started data migrations are enabled again. This can be done fast and does not reduce cluster performance. As to issue 2 we can only rely on application in the latter approach, that the application can restore consistently by itself somehow.

vshard

In case of vshard we can use vshard.router.map_callrw() to start backup on every shard. This way all in progress rebalancing will be finished before starting backup. vshard consists of synchronous replicasets, so we need synchronous replicaset backup (as described in section above) for every shard.

So to backup vshard cluster we need next steps:

Call vshard.router.map_callrw() with function ``box.backup.start()`. Make each shard backup as described in section for synchronous replicaset backup.

To restore cluster we need next steps:

Restore each shard as described in section for synchronous replicaset backup.

sergepetrenko · 2025-11-20T10:24:17Z

sergepetrenko
Nov 20, 2025
Maintainer

Restore cluster after full data lost when there is standby cluster. If standby cluster if distant in terms of data replication we can restore damaged cluster from local backup and only then align it with standby cluster.

Let's state the benefit of doing so: AFAIU we assume replication will take more time than local backup recovery.
Also the same applies to restoring a single node of a replica set, if other nodes are distant.

1 reply

nshy Dec 4, 2025
Collaborator Author

Rewrited the passage to mention explicitly time, mentioned replicaset also.

Serpentian · 2025-11-24T09:54:59Z

Serpentian
Nov 24, 2025
Collaborator

It seems, that this will be the main RFC, which unites the replicaset and cluster backup. Currently, it's very unclear to me, what we're doing, way too many questions.

Strict overview of motivation?

Let's firstly figure out, why do we implement that and what do we want to achieve at the end. We must strictly describe the goals of the RFC (e.g. do users wanna see the Point-In-Time recovery or not, according to the https://jira.vk.team/browse/TNTP-2825 they do) and the guarantees, we give to users.

For guarantees, we can check backup tools for other databases:

PostgreSQL Patroni/Etcd — pgBackRest / WAL archiving
MySQL Galera — Percona XtraBackup
Cassandra — nodetool snapshot

And please, include the links to the associated github and jira tickets, it's very difficult to find them now.

How will the process look for the end user?

We must determine, how the backup process will look like for the end user. Is he going to take the tool from SDK, configure it and start the backup? In that case the tool should automatically move the needed files to the configured servers. Then a user just calls the tool one more time and it restores the cluster from a backup?

Or do we expect user to call some vshard/aeon function, that will return, which files should be copied and from which server, user manually goes to every instance, copies files to some servers and then uses tool to restore the cluster?

Or is it going TCM or/and ATE?

From the first glance, it loooks like we need all.

API of replicaset/cluster

Then we should define, how API of the replicaset/cluster will look like, this will be called by a user or our tool.

Will we use box.backup.start/info/end(), as it's proposed here.

Will writing the metadata (e.g. timestamp, instance info) of the backup to a file be a separate API? Or box.backup.info{write = true?

Will we have vshard/aeon.router.backup()?

Review

Rebalancing, when data is moved from one replicaset to another. The data being migrated may be lost or duplicated.

It may happen in VShard, if it's done without any protections:

Backup tool comes to rs1, doesn't find data there, makes backup
rs2 sends data to rs1
Backup tool comes to rs2, doesn't find data there, makes backup.
Data is lost

For that we could use already existing vshard.router.map_callrw, which prohibits rebalancing and makes sure, that all buckets are writable on the instance. It give guarantees, that the request will be done everywhere or nowhere. Write lock is not needed.

We can abort in progress data migrations and disable new ones before starting
replicasets backup

It's not possible in VShard now and I'm not sure it's possible to implement that at all, while preserving the safety. If that's needed, we'll have to write careful RFC for that to investigate.

1 reply

Gerold103 Nov 25, 2025
Collaborator

I agree with almost everything what Nikita said. Only will leave meta-comments here:

Or do we expect user to call some vshard/aeon function, that will return, which files should be copied and from which server, user manually goes to every instance, copies files to some servers and then uses tool to restore the cluster?

Yes. Any sort of consistent cluster backup is only possible, when the replicasets can't exchange data. And this can only be done now by non-core frameworks like vshard and whatever else is on top of the core at the same level (Aeon I guess?).

Inside replicaset we could in theory invent something like pause GC on all replicas, collect xlogs, and then "merge" them. Could be done with a tool. Maybe even with the built-in xlog Lua module with some code on top of it.

But in a cluster we won't have this available. Because no vclock-like logical clock exists in the whole cluster, allowing to see which nodes did which changes.

This backup thing will need to be cluster-tech-aware. VShard and Aeon will need to be explicitly supported.

We can abort in progress data migrations and disable new ones before starting
replicasets backup

It's not possible in VShard now

Abortion isn't possible, but it is quite possible to wait for the end of the current migrations and not allow new ones. Via vshard.router.map_callrw(). No? During this call no bucket moves will be happening.

nshy · 2025-12-04T14:37:35Z

nshy
Dec 4, 2025
Collaborator Author

Strict overview of motivation?

Improved this part, hopefully addressed all the questions.

How will the process look for the end user?
API of replicaset/cluster

The high level API is outside of this RFC goals. Here we only describe Tarantool API that can be used for backup/restore by backup agent.

At this point it is not clear why we should add extra API to fetch instance config. It is already should be known due to config in Tarantool 3 or one can call box.cfg{} to fetch it.

Review

Thanx for suggestion! This part was abstract so added concrete steps for vshard (using vshard.router.map_callrw() as suggested).

0 replies

locker · 2025-12-05T15:23:25Z

locker
Dec 5, 2025
Maintainer

So to backup such replicaset we need next steps:

Backup current replicaset master as described in single instance backup.

Store each replica instance UUID in backup metadata (instance UUID is available through box.info.uuid for example).

Is it necessary? Can't we recreate a replicaset with different UUIDs, maybe even with a different replication factor?

1 reply

nshy Dec 9, 2025
Collaborator Author

Yeah looks like it is optional. But it may be required if there are bindings to UUIDs from somewhere else.

locker · 2025-12-05T15:25:21Z

locker
Dec 5, 2025
Maintainer

For example, in case of multimaster the replication can be paused due to long standing conflict, so instances can have different statements.

Do we actually support multimaster setups? Is it possible to configure one with Tarantool 3.0 config?

1 reply

nshy Dec 9, 2025
Collaborator Author

Yeah, we have a master-master section in documentation for Tarantool 3.0.

Tarantool

Tarantool backup/restore #12040

Uh oh!

Uh oh!

nshy Nov 19, 2025 Collaborator

Reviewers:

Changelog

Links

Document aims

Use cases

Backup consistency

Not addressed points of #11729

Instance

Memtx backup

Vinyl backup

Recovery

Replicaset

Synchronous replicaset

Technical details

Non synchronous replicaset

Technical details

Cluster

vshard

Replies: 5 comments · 4 replies

Uh oh!

sergepetrenko Nov 20, 2025 Maintainer

Uh oh!

nshy Dec 4, 2025 Collaborator Author

Uh oh!

Serpentian Nov 24, 2025 Collaborator

Strict overview of motivation?

How will the process look for the end user?

API of replicaset/cluster

Review

Uh oh!

Uh oh!

Gerold103 Nov 25, 2025 Collaborator

Uh oh!

nshy Dec 4, 2025 Collaborator Author

Uh oh!

locker Dec 5, 2025 Maintainer

Uh oh!

nshy Dec 9, 2025 Collaborator Author

Uh oh!

locker Dec 5, 2025 Maintainer

Uh oh!

nshy Dec 9, 2025 Collaborator Author

nshy
Nov 19, 2025
Collaborator

Replies: 5 comments 4 replies

sergepetrenko
Nov 20, 2025
Maintainer

nshy Dec 4, 2025
Collaborator Author

Serpentian
Nov 24, 2025
Collaborator

Gerold103 Nov 25, 2025
Collaborator

nshy
Dec 4, 2025
Collaborator Author

locker
Dec 5, 2025
Maintainer

nshy Dec 9, 2025
Collaborator Author

locker
Dec 5, 2025
Maintainer

nshy Dec 9, 2025
Collaborator Author