
on S
1
, T
2
returns directly on S
1
without being blocked.
However, T
2
will be blocked in S
2
until T
1
is committed
because of the incompatibility of write locks and read locks.
Therefore, the data read by T
2
on S
1
and S
2
have both
been modified by T
1
. That is to say, the data read by T
2
are
consistent. On the other hand, when T
2
arrives ahead of T
1
on partition S
1
and S
2
, neither of the data read by T
2
have
been modified by T
1
, which is also consistent. Transaction
T
1
may be blocked in this case.
Now we consider another condition. If the partitions use
MVCC as the concurrency control, a read operation sees a
snapshot containing the committed data of each transaction.
Figure 2(b) shows how transaction T
2
is executed on the
partitions with MVCC. When T
2
arrives at partition S
1
after
T
1
is committed, T
2
reads the latest version of data which
have been modified by T
1
. At the same time, T
2
arrives at
S
2
. Since T
1
is not yet committed on S
2
, T
2
reads an old
version of data b. In short, T
2
obtains the data on S
1
which
have been modified by T
1
and the data on S
2
which have
not been modified by T
1
. This is when inconsistency occurs.
Based on the two situations discussed above, we can
conclude that: 1) on partitions with 2PL, conflicts increase in
the system due to the incompatibility of read locks and write
locks, which impacts the performance; 2) on partitions with
MVCC, the data read by clients may not be consistent. In
this paper we propose DMVCC, a distributed protocol that
avoids distributed read inconsistency. Using this protocol,
we can free read locks for read operations and obtain a
consistent snapshot version in the distributed database with
evident performance improvement.
B. Distributed Multi-Version Concurrency Control
This section describes how DMVCC works to guarantee
the properties around concurrency control, and how those
properties are used to implement features such as transaction
consistency and lock-free reads.
A read-only transaction shares the benefits of snapshot
reads in performance[4]. And a snapshot read is a read
operation that reads the historical data items without locking.
In our design, a client does not need to specify a timestamp
or a version of data items for a snapshot read. He only needs
to determine whether the read operation is a snapshot read or
not. If the read operation is a snapshot read, the system will
assign a global consistent snapshot version to the operation.
If not, the client should execute the operation with SELECT
... FOR UPDATE.
To understand the DMVCC, two key points need further
elaboration: when to generate a snapshot version on each
partition and when to use the version in the system.
• Generating a snapshot: Read and write operations
in transactions use two-phase locking. As a result, the
systems can generate a snapshot anytime after all locks
are acquired and before any lock is released. When
Figure 3: System architecture
the partition generates a snapshot before a subtransac-
tion is committed, the subtransaction can not see the
modifications by itself. So for a given transaction, the
partition generates a snapshot only when the system
requires the partitions to commit the subtransaction. At
the same time, a global snapshot version is generated.
• Executing Reads with a version: When a transaction
arrives, the system assigns to it a global consistent snap-
shot version which involves all the partitions. To read
data on a partition, a snapshot read in this transaction
needs to refer to the version number related to this
partition.
Furthermore, a system using DMVCC not only reads
a global consistent snapshot, but also reduces read-write
conflicts and the chance of global deadlocks[6].
III.
DESIGN
To realize DMVCC protocol, we design a system which
can read global consistent snapshots. Figure 3 depicts the
proposed architecture of our design. The distributed database
system consists of three main components: the partitions,
the DTMs and the consistency coordinator. The partitions
are a number of local databases which store a portion of
data items. They execute subtransactions from the DTMs
and generate snapshots. Clients access the system with
the DTMs. The DTMs break down the transactions into
subtransactions and assign snapshot versions to them ac-
cordingly. The consistency coordinator is the center node
that calculates the global consistent snapshot versions. Next
we will describe the design of each component.
A. Partitions
The whole database is partitioned, with items stored
across multiple servers and each partition storing only a
portion of items. The partitions are independent from each
144
相关文档
评论