Skip to content
This repository was archived by the owner on Jul 26, 2022. It is now read-only.

Transaction Isolation

Sergei Petrunia edited this page Oct 2, 2015 · 4 revisions

Background info

Transaction isolation in MyRocks

Reads

Reads are done using RocksDB's Snapshots. This provides Snapshot Isolation: transaction doesn't see the changes (inserts/updates/deletes) that were made after the transaction was started.

Writes

Terminology: transaction does a bunch of writes (Put or Delete calls) and then it commits.

When a transaction does a write, it acquires an exclusive lock on the key being written. The lock is held until the transaction commits. This prevents any other transaction from modifying the same data.

Another possible failure scenario is as follows:

  1. trx1> start execution, take a snapshot.
  2. trx2> modify key_val (this acquires a lock on key_val)
  3. trx2> commit (this releases the lock on key_val)
  4. trx1> attempt to write key_val

In order to avoid this, transaction that does a write also checks whether there were any changes made to key_val since its snapshot was taken (in the above example: step #4 checks if there were any changes since step #1).

SELECT ... FOR UPDATE

This uses Transaction::GetForUpdate, which reads using snapshot, but also does what a write would do:

  • gets an exclusive lock on the row
  • makes a check whether the row that it got from the snapshot has been modified since the snapshot was taken.

(Note: locks are also taken for PK values that would match but were not found. TODO: elaborate on what this means)

Unique secondary indexes

Consider a table:

CREATE TABLE t1 (
  pk INT,
  col1 VARCHAR(32),
  UNIQUE INDEX(col1)
) engine=rocksdb;

The secondary index is stored in RocksDB as:

 key= {index_nr, col1, pk}
 value= {empty}

Note that the key part includes PK.

Consider an example:

1. trx1> BEGIN;
2. trx2> BEGIN;

3. trx1> INSERT INTO t1 VALUES (1,'foo');
4. trx1> COMMIT;
5. trx2> INSERT INTO t1 VALUES (2,'foo');

At step#5, trx2 inserts a row with col1=foo. Before writing a record (2,foo), it does a range scan starting with {index_nr, 'foo'}. This sees the data that was present before step#1, but doesn't see the record inserted on step #3. In order to avoid that, the range scan is done without using the snapshot.

Another peculiarity is about locks. When a transaction does a write, TransactionDB gets a lock on the key value. For unique secondary index this would be {index_nr, col1, pk}. However, in order to prevent unique key violations, the lock needs to be {index_nr, col1}.

Current code solves this by calling GetForUpdate({index_nr, col1}). The call never finds a record, but has a side effect of obtaining a lock on this value. There is a slight inefficiency there: we don't need to make an actual lookup on this key, we just need to take a lock.

Clone this wiki locally