Replies: 3 comments 2 replies
-
|
Interesting. Is there a reason why we try to solve that on the core level and not in some external module? The main problem is that no matter how you try to solve it on the server-side, a proper solution needs to much context from the application logic. See some notes below about that.
You have it in the table where you say which requests can lead to "duplicates". And you say, that "replaces" don't have that. The problem isn't just an actual duplicate key. It is the same operation applied twice. Same replace sent 2 times can also mess things up if between them the data was changed to some new state, and then the second replace reverted it back.
UUIDs can clash. It is just a hash function, in general. If you have many clients and enough time, they will clash eventually. Even if you use time-based UUID, you can't even guarantee that it will be monotonic. Because 1) real time can literally be adjusted backwards, 2) true monotonic time will be lost on restart. Tbh, I don't see how this can possibly be solved without being aware of the application logic. Any sort of random/time hashing solution I wouldn't consider safe, as it would be just a time bomb which can consider a request suddenly as "already done", even though it wasn't.
So if this one is specified, the request will be retrying potentially forever? Even when a timeout is given? |
Beta Was this translation helpful? Give feedback.
-
|
I'll ask the tarantool queue guys whether they would like this feature in core or not, and if yes, which parts of it would be interesting to them (they have deduplication keys there already, implemented on their level). Speaking of uuid being not optimal for deduplication - well, I don't see why we should limit the users to one particular type. They could use literally anything that can be encoded to msgpack and stored into a space (float, double, decimal, uuid, string, unsigned, ...). Also in my thinking the |
Beta Was this translation helpful? Give feedback.
-
1. Review1.1 Retry option in connectorsFirstly) I don't think we should introduce such high-level API, the responsibility for writing retry logic should be on the client's code, in connectors we just have to give user a convenient way to implement it. Consider the case, where user will want to execute some custom application logic between retries, it won't have such opportunity to do that, with these For this I suppose, we should allow passing the local res, err
local token = uuid.str()
while err.code == 'TimedOut' do
res, err = pcall(conn.space.bands.insert, conn.space.bands,
{1, 'Roxette', 1986}, {timeout = 1, idempotency_token = token})
-- Some custom logic, e.g. how many retries were done, the timeout for
-- whole while loop and so on...
endVShard is one of Tarantool's user, which lacks the retry logic for RW requests. And the options, you propose, look like the ones, which we already use there:
P.S. Below you write about net.box, which is actually just a Lua connector to the Tarantool. The python and lua connectors should have the similar interface, so I didn't understand, why they differ. I'm ok with 1.2 New IPROTO interface, UUID in the header.
We should allow specifying any kind of idempotency token, user should decide. As Vlad said, UUID may clash, and even if the possibility of that clash is near 0, this still can happen. The result of such clash will be too serious: the DB will say, that the transaction was applied, when it wasn't actually. Only the user can decide, which data he wanna use as token and it really depends on the app. UUID looks like a good general solution, but if user will use it, then he'll undestand, that they could clash. What I'm trying to say, the token should be completely transparent to user and he should decide what to use. For that we should think about using some general type of the
Firstly) We cannot do that, this will hit the performance of all requests. I agree, that if it's implemented, the idempotency token should be in the HEADER of xrow, since we don't need to decode the body (which may be costly). But by default it should be Nil, not encoded at all, not the 16 bytes of UUID or string to decode on every request. User, who doesn't use the feature, should not pay with performnce, no matter where it's implemented (in Tarantool or VShard). Secondly) I'm not sure, we should limit, where this key may exist, let the user decide, whether he wants 1.3 Server sideThis one is much more difficult, since it should meet the requirements of any client.
Consider a user, who wanna deduplicate the following function (as I said, we need a way to deduplicate function increment()
_G.counter = _G.counter + 1
endThis user doesn't need that value to be saved in the global In order to allow that, I propose to introduce the In addition to the Here, in order not to make all users pay with perf I see two options:
Firstly) How will you know, when the key was written? We'll probably need timestamp in the schema of the And I don't really like, that we try to do the logic of the above standing module in the core, we already have the the Secondly) This is not enough, the problem here is that it may happen, that there will come millions of requests in second, and we'll write so many entries to our space, that the memory will end. Users will want to have 2 options for that:
I'd consider replying with error, when the request was already applied, according to the whole text, I've written above. 2. Potential scenariosNow, let's check out, which options we have.
In that variant, we should implement the config options
The problem with the second solution is that users, who doesn't need VShard, won't have the deduplication logic. Every user or above standing Tarantool project (e.g. AEON) will have to implement the deduplication itself and the API may differ. The third solution propagates that incosistency even thurther, everyone has to implement everything from the ground. The deduplication is possible only via The problem of the first solution is that it's difficult to satisfy the requirements of all users, but it's still possible. It's a step towards unification of the deduplication in Tarantool, we'll have to do that only once and none of the above standing projects will have to do that themseves. Plus, we get deduplication in all requests, simplifying the users life (and probably making I'm here on the side of the first solution, but firstly we should indeed ask users, do they actually need that. But according to the JIRA ticket, some of users may want that (but, AFAU, we cannot ask product managers anymore here). @mrForza, @Gerold103 WDYT? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Reviewers
[] Main Reviewer: @Serpentian
[] Second Reviewr: @Gerold103
[] TeamLead: @sergepetrenko
[] CTO: @sergos
Tickets
JIRA: https://jira.vk.team/browse/TNTP-3265
JIRA: https://jira.vk.team/browse/TNT-1374?focusedCommentId=37716675
Changelogs
Summary
Problem
Suppose our system has two main components:
A client (library / driver / connector) to a distributed database.
A distributed database consisting of one master and n-1 replicas.
The client creates a synchronous space consisting, for example, of several integer fields:
[id, value]. It then performs a non-idempotent operation, e.g. incrementing thevaluefield by 1. Since no existing Tarantool connector implements safe retry logic for non-idempotent requests, and the server side does not provide deduplication mechanisms, this responsibility falls entirely on the client's application code. As a result, the client does not always ensure idempotency of its operations or deduplicate non-idempotent requests, which can lead to data duplication and inconsistent state both on the client side and in the distributed database.During the execution of this non-idempotent request, something unexpected may occur between the client service and the distributed database: a network disconnects before the request reaches the database, a database crashes or hangs, or a network failures before the response is sent back. As a result, the client waits for
nseconds and then receives aTimedOuterror within its service. It subsequently retries by sending an identical non-idempotent request to the distributed database. By the time the second request is sent, the original issue that caused theTimedOuterror on the client side has suddenly been resolved: e.g. the network connection is restored, the master instance has restarted, or a heavy transaction has finally completed. Consequently, the master node in the distributed database receives two identical non-idempotent requests, even though the client intended to send only one. Thus, the client expects thevaluefield to be incremented by 1, but upon reading the data, it discovers that thevaluefield has actually been incremented by 2. This issue is common in distributed databases and has a well-known name: "duplicate requests."Causes
To begin with, it is worth outlining the main causes that can, to a greater or lesser extent, lead to the problem of duplicate requests and, consequently, to an inconsistent state on both the client and the distributed database sides:
Use of unsafe retry mechanisms
CLIENT side. Most retry mechanisms implemented in connectors or client code do not verify the state of the distributed database after receiving aTimedOuterror. As a result, a retry request is sent even in cases where the previous request was successfully executed, but its response never reached the client.Use of non-idempotent requests
CLIENT side. Applying two non-idempotent requests always leads to different database states. Because connectors do not provide any interface to manually mark requests as idempotent, users may write incorrect business logic, overlooking whether their requests are idempotent or non-idempotent.No server-side mechanisms for detecting and handling duplicate identical requests
SERVER side. Because the server lacks logic to distinguish between two identical non-idempotent requests that may belong to the same retry session, it ends up executing all of them. This is the final cause leading to the problem, especially when the two previously mentioned issues remain unaddressed.Preferred solution
Client side (driver)
As an example of a client driver, the following repository will be considered: tarantool-python.
In this connector, the primary entity used to interact with a Tarantool instance is the
Connectionclass object. This object provides the following methods, which enable the execution of Query, DML, and DDL requests; their descriptions are provided below:First, it is necessary to modify the interface of these methods. From now on, they will accept the following parameters:
retrytimeoutTimedOuterror being thrown. If theretryparameter is enabled, the request will be retried once this timeout period elapses.The following algorithms will only take effect when the
retryandtimeoutparameters of the correspondingConnectionobject methods are set to values different from their defaults.Description of the retry algorithm for READ requests (
get,select):Description of the retry algorithm for WRITE requests (
eval,execute,insert,replace,delete,space,update,upsert):A unique 16-byte binary key of type UUID is generated.
All request objects (e.g.,
RequestInsert,RequestUpdate, etc.) must include an additional attribute,idempotency_token, in accordance with the updated IPROTO protocol. The generated unique key is attached to each request object.The request, now containing the token, is sent to the master node.
Below is a diagram of the client-side algorithm for clarity:
New IPROTO interface
All IPROTO requests listed below must have an updated header containing an additional binary field,
IPROTO_IDEMPOTENT_TOKEN, of 16 bytes in size (used to encode a UUID value). Since not all requests need to be sent with retry semantics, when a zero UUID (00000000-0000-0000-0000-000000000000) is provided, Tarantool will process the request using the standard algorithm.IPROTO WRITE REQUESTS:
IPROTO WRITE RESPONSE:
If an IPROTO request was sent with retry semantics, the response to the user may take one of the following forms:
Successful execution:
IPROTO_OKwith a boolean fieldIPROTO_DATAset totrue.Failed execution:
IPROTO_ERROR.IPROTO_SCHEMA_VERSION
MP_UINT
IPROTO_SCHEMA_VERSION
MP_UINT
IPROTO_ERROR_24
MP_STR
New Lua interface
To enable users to retry not only write requests but entire transactions, we will extend the interface of the
box.beginandbox.atomicfunctions by adding an additional parameter,idempotent_token, to theoptsoptions table.box.begin:
box.atomic:
New net.box interface
Similarly, we will extend the interface of all write functions in the
net.boxAPI by adding theidempotent_tokenparameter to theoptsoption.Server side
Where idempotency tokens are stored:
To enable the database to distinguish identical requests within the same retry session, the idempotency keys provided in request headers must be stored somewhere. It is proposed to store these keys in a global system space named
_idempotency. The structure of this space is shown below.How long idempotency tokens are stored:
It is also important to consider how we will clean up idempotency tokens from the database. To this end, we will introduce a new parameter in
box.cfgcalledidempotency_lifetime, which defines the lifetime of idempotency tokens (in seconds). Once this time has elapsed since the creation of a tuple in the_idempotencyspace, triggers will automatically remove the expired tuple from this space.box.cfg.idempotancy_lifetimeGeneralized algorithm for handling idempotency keys on the master node:
The process begins by looking up a tuple in the global system space
_idempotencythat matches the current request’s idempotency key. This determines whether the request has already been executed previously.Was a tuple found in the space?
Did the previous execution succeed?
iproto/net.box/lua).Execute the current request:
success = true) in the_idempotencyspace.success = false) in the_idempotencyspace.Next, we replicate all spaces modified as a result of the request or transaction, including
_idempotency. Ifis_syncis set totrue, we return the query result to the client only after a quorum of acknowledgments has been collected. For asynchronous requests, we return the response to the client immediately.Below is a diagram of the server-side algorithm for clarity:
Other solutions
Below are various methods and patterns, providing partial or complete solutions to the duplicate request problem—implemented on the client (driver) and server (distributed database) sides in different databases and their SDKs. However, not all of these methods are suitable for Tarantool specifically.
Client side solutions
No Retry mechanisms
PARTIAL solution🔴Description: The absence of retry mechanisms is the most primitive and straightforward way to prevent duplicate requests at the driver level. However, this does not provide a 100% guarantee, as the client may implement retry-like patterns in its own application code.
Pros:
Cons:
Places full responsibility for implementing this pattern on the client. There is a high risk that the client will implement it incorrectly and encounter the aforementioned problem again.
Severely limits the functionality of the client driver.
Automatic retry only for READ/idempotent operations
PARTIAL solution🔴Description: This approach allows retry logic to be applied only to operations whose semantics guarantee idempotency to the user.
It is important to note that this method is suitable only for drivers that provide a NoSQL-style API with a clear distinction between READ and idempotent operations. This characteristic of the driver ensures a well-defined and finite set of operations that are safe to retry.
However, if the client library exposes only an SQL-like API (i.e., sending textual SQL queries through the driver), we face the challenge that determining the idempotency of an SQL query would require syntactic and semantic analysis, significantly complicating the driver’s logic.
Pros:
Cons:
Does not fully eliminate the problem of data duplication, as users will still implement their own retry logic specifically for non-idempotent operations.
Requires a NoSQL-style API with a clear distinction between operations that are safe to retry and those that are not.
If only an SQL-like API is available, it necessitates syntactic and semantic analysis of queries.
Idempotency flag in retry mechanism
PARTIAL solution🟠Description: With this approach, the user can apply retry mechanisms to absolutely any operation (an improvement over Solution 2). However, the responsibility for determining whether an operation is idempotent lies entirely with the user. If the user is confident that the operation is safe to retry, they set the
idempotency = Trueflag, and the request will then be automatically retried internally until it succeeds. However, if the user misjudges the idempotency of the operation, they will encounter the aforementioned problem.Pros:
A fully implemented retry pattern supporting all operations.
Lower risk of data duplication compared to Solutions 1 and 2.
Cons:
Server side solutions
Conditional Writes with idempotency keys at the data schema level 🟠
Description: This pattern is a special case of "idempotency keys." Its core idea is that the user explicitly designates which tables or spaces should support idempotency tokens. Such tables must include a mandatory unique field that stores these keys at the schema level of the space. In the event of an idempotency key conflict, the user defines their own business logic to resolve it. All write operations (
insert,delete,replace,upsert,update, etc.) must use a non-standard interface or SQL syntax. Example:The logic for adding/removing idempotency keys from user-defined spaces differs from the standard approach. In this method, immediately after a retried operation succeeds, the idempotency key field must be cleared from the corresponding tuple to avoid conflicts between requests and transactions belonging to different retry sessions. This cleanup occurs precisely when the client sends an acknowledgment confirming receipt of the database's response.
Pros:
Eliminates the need to create a separate space-idempotency keys are stored directly within the same spaces and tuples that require deduplication (keys are physically co-located with the data).
Enables embedding more sophisticated business logic directly into queries to handle conflicts during retries.
Cons:
Requires modifications or extensions to the SQL or Lua interface so that write operations can support conditional writes and conflict resolution logic.
Business requirements do not always permit schema changes to user-defined spaces at the system level.
External idempotency keys 🔴
Description: This approach involves storing idempotency keys along with operation results not in the database itself, but in an external service, e.g. a cache such as
RedisorMemcached. Now, before executing a write request or transaction, the master node checks for the idempotency key not in a database space, but in the external cache. If the cache already contains the required token, the database can return the result of the previous request execution - the one whose response never reached the client. If the token is not present in the cache, the master node executes the request and sends two responses: one to the client and another to the caching service.Pros:
Cons:
Not suitable for Tarantool, since we are not allowed to use third-party products for storing system or user data within it.
Introduces additional failure points due to the inclusion of a new external service in the system.
May cause noticeable performance degradation due to the extra network hop required to communicate with another component in the distributed system.
Beta Was this translation helpful? Give feedback.
All reactions