Read Consistency in Distributed Database Based on DMVCC.pdf

章芋文

126

10页

2次

2022-04-21

免费下载

Read Consistency in Distributed Database Based on DMVCC

Jie Shao

†§

, Boxue Yin

, Bujiao Chen

, Guangshu Wang

, Lin Yang

Jianliang Yan

, Jianying Wang

, Weidong Liu

†

Tsinghua University

Baidu,Inc

†

shao-j14@mails.tsinghua.edu.cn

†

liuwd@mail.tsinghua.edu.cn

{yinboxue,chenbujiao,wangguangshu,yanglin05,yanjianliang,wangjianying}@baidu.com

Abstract—In a traditional distributed database system, the

partitions use two-phase locking (2PL) as the concurrency

control protocol to ensure distributed read consistency. But

the read-lock acquired by a read operation is incompatible

with a write-lock, which undermines the performance of the

system. While in a system at the snapshot isolation level, where

partitions use Multi-Version Concurrent Control (MVCC) as

the concurrent control protocol, distributed read inconsistency

may occur. To achieve read consistency and guarantee the

performance at the same time, we propose Distributed Multi-

Version Concurrent Control (DMVCC). With DMVCC, the

system can support snapshot reads, which do not block write

operations, and ensure distributed read consistency. In this

protocol, a transaction obtains a set of consistent snapshot

version numbers at the startup time. The transaction then uses

those numbers to read the corresponding data stored on each

partition. The correctness of the protocol is strictly proved.

We conduct a series of experiments to compare the perfor-

mance of the system when using and not using DMVCC with a

scaled TPC-C benchmark. We observe that our DMVCC based

system outperforms the system using 2PL at both medium (up

to 1.53x speed up) and high contention (up to 2.0x speed up)

levels. Furthermore, when read/write ratio goes up to 1:1, the

throughput of the DMVCC based system is 290% higher than

that of the system using 2PL. The scalability of the system is

also presented.

I. INTRODUCTION

As the data increase, many large-scale services in Baidu

such as Baidu Wallet can no longer store data in a single

database. Being a Chinese counterpart of PayPal, Baidu

Wallet relies on a distributed on-line transaction processing

(OLTP) system as its storage backend. OLTP systems require

concurrency control to guarantee consistency[6], [7], so that

services running on top of them can function correctly. With-

out right concurrency control, Baidu Wallet could transfer

more money than there is from the account, execute the

transfer twice, transfer the wrong amount of money, or

present the wrong balance after a transaction.

While concurrency control is a well-studied ﬁeld concern-

ing single databases, the performance of protocols such as

two-phase locking (2PL)[6] is limited with high-contention

workloads, especially when the database receives long read-

only transactions. To solve this problem, Multi-Version

Concurrency Control (MVCC)[7], [11], [14] is proposed.

For read operations, a client is allowed to read historical data

Figure 1: The throughput of a single database at different

isolation levels as the client number increases

to avoid read-write conﬂicts. This improve the performance

intensely[16].

To prove that, we set up a simple experiment that com-

pares the performances of a single database at different

isolation levels. In this experiment, we use MySQL[1] as our

database, which has different concurrency control methods.

2PL is used at the serializable isolation level, while MVCC

is used at the repeatable read level. TPC-C[2] is used as

our benchmark. The database contains 5 warehouses. The

experimental setup remains the same in Section V. Figure 1

shows the results:

• When the number of clients is less than 10, the perfor-

mance of the system remains almost unchanged at the

repeatable read level and at the serializable level since

there is little contention.

• As the number of clients increases, the throughput of

the database at the serializable isolation level drops

sharply. Meanwhile, the throughput of the system at the

repeatable read level almost remains the same. That is

to say the drop is caused by read and write conﬂicts

instead of resource limitation.

2016 IEEE 23rd International Conference on High Performance Computing

DOI 10.1109/HiPC.2016.11

142

(a) partitions using 2PL

(b) partitions using MVCC

Figure 2: How transactions read data using 2PL and using MVCC

As Figure 1 shows, if the distributed database uses MVCC

as the concurrency control protocol, the system may have

a better performance. Unfortunately, the distributed system

requires that the transactions use 2PL in every partition

[1].

A MVCC-based distributed database system at the non-

serializable level leads to distributed read inconsistency,

which refers to the circumstance where only some of the

updates of a distributed transaction are visible to other

transactions. We will explain that in detail in Section II.

Many transactions in Baidu Wallet such as account checking

and statistic analysis do not need to read the latest data. But a

global consistent snapshot is still needed. So in this paper we

propose DMVCC (Distributed Multi-Version Concurrency

Control), a distributed concurrency protocol that ensures

distributed read consistency and the performance at the same

time.

DMVCC is a two-phase protocol based on two-phase

commit (2PC)[6], [19]. A set of distributed transaction

managers (DTMs) run the protocol on behalf of clients. At

the beginning of a transaction, the DTM connects with the

consistency coordinator to obtain global consistent snapshot

version numbers. In the prepare phase, the transaction uses

the snapshot version number to read data items stored on

each partition. In the commit phase, the DTM collects the

snapshot versions from each partition and sends them to

the consistency coordinator. The consistency coordinator

then calculates the new global consistent snapshot version.

With DMVCC, the system guarantees that all or none of

the updates of a distributed transaction is visible to other

transactions.

The rest of the paper proceeds as follows. Section II

presents a case of distributed read inconsistency and the way

to avoid it. Section III elaborates the design of a DMVCC-

based system. In Section IV, we prove the correctness of

DMVCC. Section V sets up the experiments and presents

In our design, we call each database a partition.

the results of our experimental evaluation. Then we discuss

the related work in Section VI and present the conclusion

in section VII.

II.

OVERVIEW

This section begins with a review of why we can not use

MVCC in a distributed system. Then we will explain how

the distributed multi-version concurrency control works.

A. A case of distributed read inconsistency

Application programmers usually prefer the highest isola-

tion level to simplify the reasoning of correctness in the face

of concurrent transactions[6], [17]. To guarantee distributed

consistency, a distributed transaction runs standard concur-

rency control schemes such as standard 2PL combined with

two-phase commit (2PC)[6], [8], [9].

To give a more intuitive explanation, we demonstrate a

simpliﬁed process of money transferring in Baidu Wallet,

during which the system reads the balance data from the

account to ensure consistency. As is shown below, there are

two distributed transactions T

and T

and two partitions S

and S

, which contain data items a and b respectively. In

transaction T

, account A transfers 10 dollars to account B.

Transaction T

then reads the data of the current account

balance of A and B.

: UPDATAa=a-10 UPDATAb=b+10

: READ a READ b

Firstly, let us see how distributed transactions work on

each partition which uses standard 2PL as the concurrency

control protocol and 2PC as the distributed commit protocol.

Suppose partition S

and S

receive T

and T

’s subtransac-

tions sequentially. The process is shown in Figure 2(a). We

can see that T

arrives when T

is committed on partition S

but not yet committed on partition S

because of network

latency or thread scheduling on S

. Since T

is committed

143

on S

, T

returns directly on S

without being blocked.

However, T

will be blocked in S

until T

is committed

because of the incompatibility of write locks and read locks.

Therefore, the data read by T

on S

and S

have both

been modiﬁed by T

. That is to say, the data read by T

are

consistent. On the other hand, when T

arrives ahead of T

on partition S

and S

, neither of the data read by T

have

been modiﬁed by T

, which is also consistent. Transaction

may be blocked in this case.

Now we consider another condition. If the partitions use

MVCC as the concurrency control, a read operation sees a

snapshot containing the committed data of each transaction.

Figure 2(b) shows how transaction T

is executed on the

partitions with MVCC. When T

arrives at partition S

after

is committed, T

reads the latest version of data which

have been modiﬁed by T

. At the same time, T

arrives at

. Since T

is not yet committed on S

, T

reads an old

version of data b. In short, T

obtains the data on S

which

have been modiﬁed by T

and the data on S

which have

not been modiﬁed by T

. This is when inconsistency occurs.

Based on the two situations discussed above, we can

conclude that: 1) on partitions with 2PL, conﬂicts increase in

the system due to the incompatibility of read locks and write

locks, which impacts the performance; 2) on partitions with

MVCC, the data read by clients may not be consistent. In

this paper we propose DMVCC, a distributed protocol that

avoids distributed read inconsistency. Using this protocol,

we can free read locks for read operations and obtain a

consistent snapshot version in the distributed database with

evident performance improvement.

B. Distributed Multi-Version Concurrency Control

This section describes how DMVCC works to guarantee

the properties around concurrency control, and how those

properties are used to implement features such as transaction

consistency and lock-free reads.

A read-only transaction shares the beneﬁts of snapshot

reads in performance[4]. And a snapshot read is a read

operation that reads the historical data items without locking.

In our design, a client does not need to specify a timestamp

or a version of data items for a snapshot read. He only needs

to determine whether the read operation is a snapshot read or

not. If the read operation is a snapshot read, the system will

assign a global consistent snapshot version to the operation.

If not, the client should execute the operation with SELECT

... FOR UPDATE.

To understand the DMVCC, two key points need further

elaboration: when to generate a snapshot version on each

partition and when to use the version in the system.

• Generating a snapshot: Read and write operations

in transactions use two-phase locking. As a result, the

systems can generate a snapshot anytime after all locks

are acquired and before any lock is released. When

Figure 3: System architecture

the partition generates a snapshot before a subtransac-

tion is committed, the subtransaction can not see the

modiﬁcations by itself. So for a given transaction, the

partition generates a snapshot only when the system

requires the partitions to commit the subtransaction. At

the same time, a global snapshot version is generated.

• Executing Reads with a version: When a transaction

arrives, the system assigns to it a global consistent snap-

shot version which involves all the partitions. To read

data on a partition, a snapshot read in this transaction

needs to refer to the version number related to this

partition.

Furthermore, a system using DMVCC not only reads

a global consistent snapshot, but also reduces read-write

conﬂicts and the chance of global deadlocks[6].

III.

DESIGN

To realize DMVCC protocol, we design a system which

can read global consistent snapshots. Figure 3 depicts the

proposed architecture of our design. The distributed database

system consists of three main components: the partitions,

the DTMs and the consistency coordinator. The partitions

are a number of local databases which store a portion of

data items. They execute subtransactions from the DTMs

and generate snapshots. Clients access the system with

the DTMs. The DTMs break down the transactions into

subtransactions and assign snapshot versions to them ac-

cordingly. The consistency coordinator is the center node

that calculates the global consistent snapshot versions. Next

we will describe the design of each component.

A. Partitions

The whole database is partitioned, with items stored

across multiple servers and each partition storing only a

portion of items. The partitions are independent from each

144

of 10

免费下载

gaiadb paper

相关文档

评论