暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
GeoGauss Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database.pdf
279
27页
11次
2023-07-27
免费下载
62
GeoGauss: Strongly Consistent and Light-Coordinated OLTP
for Geo-Replicated SQL Database
WEIXING ZHOU and QI PENG, Northeastern University, China
ZIJIE ZHANG, Huawei Technology Co., Ltd, China
YANFENG ZHANG
, Northeastern University, China
YANG REN and SIHAO LI, Huawei Technology Co., Ltd, China
GUO FU and YULONG CUI, Northeastern University, China
QIANG LI, Huawei Technology Co., Ltd, China
CAIYI WU, SHANGJUN HAN, and SHENGYI WANG, Northeastern University, China
GUOLIANG LI, Tsinghua University, China
GE YU, Northeastern University, China
Multinational enterprises conduct global business that has a demand for geo-distributed transactional databases.
Existing state-of-the-art databases adopt a sharded master-follower replication architecture. However, the
single-master serving mode incurs massive cross-region writes from clients, and the sharded architecture
requires multiple round-trip acknowledgments (e.g., 2PC) to ensure atomicity for cross-shard transactions.
These limitations drive us to seek yet another design choice. In this paper, we propose a strongly consistent
OLTP database
GeoGauss
with full replica multi-master architecture. To eciently merge the updates from
dierent master nodes, we propose a multi-master OCC that unies data replication and concurrent transaction
processing. By leveraging an epoch-based delta state merge rule and the optimistic asynchronous execution,
GeoGauss
ensures strong consistency with light-coordinated protocol and allows more concurrency with
weak isolation, which are sucient to meet our needs. Our geo-distributed experimental results show that
GeoGauss
achieves 7.06X higher throughput and 17.41X lower latency than the state-of-the-art geo-distributed
database CockroachDB on the TPC-C benchmark.
CCS Concepts: Information systems Relational parallel and distributed DBMSs.
Additional Key Words and Phrases: Geo-distributed; multi-master replication; replica consistency; transaction
processing; deterministic databases
ACM Reference Format:
Weixing Zhou, Qi Peng, Zijie Zhang, Yanfeng Zhang, Yang Ren, Sihao Li, Guo Fu, Yulong Cui, Qiang Li,
Caiyi Wu, Shangjun Han, Shengyi Wang, Guoliang Li, and Ge Yu. 2023. GeoGauss: Strongly Consistent and
Light-Coordinated OLTP for Geo-Replicated SQL Database. Proc. ACM Manag. Data 1, 1, Article 62 (May 2023),
27 pages. https://doi.org/10.1145/3588916
Yanfeng Zhang is the corresponding author.
Authors’ addresses: Weixing Zhou, Qi Peng, Yanfeng Zhang, Guo Fu, Yulong Cui, Caiyi Wu, Shangjun Han, Shengyi
Wang, Ge Yu, Northeastern University, No. 195, Chuangxin Road, Hunnan District, Shenyang, Liaoning, China, 110169,
{zhouwx@stumail, pengqi@stumail, Zhangyf@mail, yuge@mail}.neu.edu.cn; Zijie Zhang, Yang Ren, Sihao Li, Qiang Li,
Huawei Technology Co., Ltd, Xian, Shanxi, China, {zhangzijie9, renyang1, sean.lisihao, liqiang199}@huawei.com; Guoliang
Li, Tsinghua University, 30 Shuangqing Road, Haidian District, Beijing, China, liguoliang@tsinghua.edu.cn;
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the
full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specic permission and/or a fee. Request permissions from permissions@acm.org.
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2836-6573/2023/5-ART62 $15.00
https://doi.org/10.1145/3588916
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 62. Publication date: May 2023.
62:2 Weixing Zhou et al.
1 INTRODUCTION
Global companies have built their data centers located in many countries worldwide. To support
their global business, it is desired to develop a geo-distributed transactional SQL database spread
across multiple geographically distinct locations, e.g., many telecom service providers have deployed
their ICT databases under a geo-distributed setting. The design goals are towards high availability,
strong consistency, and high performance.
Range1
Range2
Range3
Range1
Range2 Range2
Range3
Range1
Range3
Tx2 Tx3Tx1
(a) Sharded master-follower replication
write read
replication
delta state
CRDT merge
master follower
Range1
Range2
Range3
Range1
Range2 Range2
Range3
Range1
Range3
Tx2 Tx3Tx1
(b) Full replica multi-master replication
Range3
Range2
Range1
Fig. 1. Sharded master-follower replication vs. full replica multi-master replication.
High availability is usually achieved by redundant data replication, which is the process of
storing the same data copies in multiple geographic zones. Data replication facilitates not only
high availability but also geographic locality and read scalability, making data copies close to users
at dierent regions to reduce read latency and to further improve overall data access throughput.
Existing state-of-the-art geo-distributed transactional databases, e.g., Google Spanner [
41
], F1 [
59
],
CockroachDB [
67
], YugabyteDB [
22
], TiDB [
55
],
SLOG
[
61
] and ConuxDB [
38
] adopt a sharded
master-follower replication architecture as shown in Figure 1a. Data are partitioned into multiple
shards according to the key range. Each shard is assigned to a single master node serving all
write/read requests, and it is replicated and placed to multiple geo-distributed follower nodes
serving only read requests. Due to its single-master architecture, write-write conicts are gathered
in the same worker to be easily coped with. In addition, sharding can disperse write requests to
increase write throughput.
The sharded master-follower replication architecture is widely adopted [
22
,
41
,
55
,
61
,
67
], but it
suers from two major drawbacks. 1) The single-master serving mode requires to route the write
requests from all clients to the single master node, which leads to cross-region writes and as a result
increases transaction latency. Though this drawback can be alleviated by geo-aware partitioning
and regional shard placement [
67
], it still hurts performance, especially for applications without
locality property. 2) The sharded architecture relies on the two-phase commit (2PC) protocol to
ensure atomicity. This requires multiple round-trip acknowledgments between the coordinator and
the globally distributed workers, which further hurts performance.
Yet another choice for data replication is full replica multi-master architecture as shown in Figure
1b, where each server maintains a full copy of data and all server nodes serve both read and write
requests. By placing a full replica in each region, it can serve users with local writes/reads. With a
full replica, 2PC is unnecessary to ensure atomicity. A number of multi-master systems emerge
in recent years, e.g., Aurora [
71
], Riak [
19
], Calvin [
70
], FaunaDB [
9
], Anna [
73
], Aria [
51
] and
Q-store [
57
]. However, to employ multi-master architecture, there are three key challenges to be
addressed.
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 62. Publication date: May 2023.
of 27
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜