暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
ASPLOS 2023-Persistent Memory Disaggregation for Cloud-Native Relational Databases.pdf
334
15页
7次
2023-05-31
免费下载
Persistent Memor y Disaggregation for Cloud-Native Relational
Databases
Chaoyi Ruan
rcy@mail.ustc.edu.cn
USTC, Alibaba Group
China
Yingqiang Zhang
yingqiang.zyq@alibaba-inc. com
Alibaba Group
China
Chao Bi
bc233333@mail.ustc.edu.cn
USTC, Alibaba Group
China
Xiaosong Ma
xma@hbku.edu.qa
QCRI, HBKU
Qatar
Hao Chen
ch341982@alibaba-inc.com
Alibaba Group
China
Feifei Li
lifeifei@alibaba-inc.com
Alibaba Group
China
Xinjun Yang
xinjun.y@alibaba-inc.com
Alibaba Group
China
Cheng Li
chengli7@ustc.edu.cn
USTC, Anhui Key HPC Lab
China
Ashraf Aboulnaga
aaboulnaga@hbku.edu.qa
QCRI, HBKU
Qatar
Yinlong Xu
ylxu@ustc.edu.cn
USTC, Anhui Key HPC Lab
China
ABSTRACT
The recent emergence of commodity persistent memory (PM) hard-
ware has altered the landscape of the storage hierarchy. It brings
multi-fold benets to database systems, with its large capacity, low
latency, byte addressability, and persistence. However, PM has not
been incorporated into the popular disaggregated architecture of
cloud-native databases.
In this paper, we present PilotDB, a cloud-native relational data-
base designed to fully utilize disaggregated PM resources. PilotDB
possesses a new disaggregated DB architecture that allows com-
pute nodes to be computation-heavy yet data-light, as enabled by
large buer pools and fast data persistence oered by remote PMs.
We then propose a suite of novel mechanisms to facilitate RDMA-
friendly remote PM accesses and minimize operations involving
CPUs on the computation-light PM nodes. In particular, PilotDB
adopts a novel compute-node-driven log organization that reduces
network/PM bandwidth consumption and a log-pull design that
enables fast, optimistic remote PM reads aggressively bypassing
the remote PM node CPUs. Evaluation with both standard SQL
benchmarks and a real-world production workload demonstrates
that PilotDB (1) achieves excellent performance as compared to
the best-performing baseline using local, high-end resources, (2)
signicantly outperforms a state-of-the-art DRAM-disaggregation
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
ASPLOS ’23, March 25–29, 2023, Vancouver, BC, Canada
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9918-0/23/03.. .$15.00
https://doi.org/10.1145/3582016. 3582055
system and the PM-disaggregation solution adapted from it, (3)
enables faster failure recovery and cache buer warm-up, and (4)
oers superior cost-eectiveness.
CCS CONCEPTS
Computer systems organization
Cloud computing;
Hardware Non-volatile memory.
KEYWORDS
cloud-native database, persistent memory, memory disaggregation
ACM Reference Format:
Chaoyi Ruan, Yingqiang Zhang, Chao Bi, Xiaosong Ma, Hao Chen, Feifei
Li, Xinjun Yang, Cheng Li, Ashraf Aboulnaga, and Yinlong Xu. 2023. Per-
sistent Memory Disaggregation for Cloud-Native Relational Databases. In
Proceedings of the 28th ACM International Conference on Architectural Sup-
port for Programming Languages and Operating Systems, Volume 3 (ASPLOS
’23), March 25–29, 2023, Vancouver, BC, Canada . ACM, New York, NY, USA,
15 pages. https://doi. org/10.1145/3582016. 3582055
1 INTRODUCTION
The past decade has witnessed the emergence and growth of dis-
aggregated cloud-native databases, with successful systems such
as Amazon Aurora [
65
], Alibaba PolarDB [
14
16
], and Microsoft
Socrates [
7
], spanning their processing across multiple layers of
network-connected resource pools (Figure 1). The compute nodes
(CNs) host the computation logic, leveraging remote but shared
DRAM space as an extension to the local buer pool for memory ca-
pacity, and the replicated storage pool for data persistence and fault
tolerance. For users, the rich, elastic, and on-demand conguration
of disaggregated cloud-native databases cater to their diverse work-
load requirements and exible scaling needs. For service providers,
ASPLOS ’23, March 25–29, 2023, Vancouver, BC, Canada C. Ruan, Y. Zhang, C. Bi, X. Ma, H. Chen, F. Li, X. Yang, C. Li, A. Aboulnaga, and Y. Xu
such services allow the reuse of database software infrastructures
as well as the consolidation/sharing of hardware resources.
SQL/TXN engine
Compute node (CN)
Memory node (MN)
Log
Storage node (SN)
Write
Append
DRAM buffer pool
Write
Update
Write
Read
Read
Cache
Log
Data pages
Figure 1: Sample disaggregated cloud-native DB architecture
spanning three layers: CPU, memor y, and storage
Though such new architecture enables the independent scaling
of resources, there remain major constraints impeding its adoption.
First, DRAM disaggregation faces its limited per-machine density,
high (and uctuating) price [
47
], and volatility, making it a more
costly and less reliable layer for hosting a cloud-native database’s
working set. Second, writes remain slow, especially with transac-
tions, as changes need to be persisted in time to the storage layer.
In this work, we argue that Persistent Memory (PM), also known
as non-volatile memory (NVM), driven by diverse technologies
such as 3D XPoint [
20
], BiCS Flash [
21
], and PCM [
55
], emerges
as an appealing layer for resource disaggregation. Compared with
DRAM, PM oers higher provisioning density (e.g., one DIMM slot
can hold 512
GB
Optane PM, but only 128
GB
DDR4 DRAM). It
simultaneously oers persistence, enabling fast writes and recov-
ery. In addition, PM preserves ultra-low-latency remote access via
RDMA, an advantage over fast SSDs. Such multi-fold capability
makes PM an ideal candidate for disaggregated databases, as we can
simultaneously cache hot pages and persist log data on a shared and
distributed PM layer. This brings on-demand, cost-eective memory
buer expansion, fast data persistence, and enhanced availability.
However, existing PM disaggregation work has not fully consid-
ered database redesign to utilize the versatile PM units, focusing
instead on supporting native data structures [
45
] or simple applica-
tions like KV stores [
63
]. Applying these solutions to cloud-native
databases could easily lead to new bottlenecks on the shared, re-
mote PM nodes (PMNs). The rst is the tension between the limited
PM write bandwidth [
26
,
32
,
75
] and the heavy bandwidth con-
sumption of existing solutions. The latter is largely due to writing
redundancy/amplication caused by logging and dirty data ushing.
Ooading log management to the PM side would reduce the PM
bandwidth pressure (by not sending dirty pages but reproducing
them by PM-side log application). On the other hand, this comes at
the price of heavy CPU involvement on the PM nodes, required to
handle ooaded data (especially their updates) and coordinate con-
current data accesses. Finally, complex management logic on the PM
side would complicate the critical-path reads and writes. Both these
PM-side bottlenecks (write bandwidth and CPU), unfortunately,
conict directly with the main selling point of PM disaggregation
for the cloud: having a shared PM node pool supporting many
compute nodes running database instances.
To address these challenges, we propose PilotDB, a novel PM-
disaggregated cloud-native database architecture featuring the fol-
lowing innovations.
First, PilotDB embodies CDLog (Compute-node-Driven Logging),
a central logging mechanism that eciently ooads bulk data to
the PM layer as a large, fast page buer, yet with light computation
there to support speedy logging and update handling. While retain-
ing page-based data organization of relational databases, it discards
the conventional page-based WAL organization and instead adopts
ne-grained, physical logging, where data entries directly embed
changes at a mini-page granularity as well as concerned remote PM
memory addresses. This allows compute nodes only to ush CDLog
entries to remote PM via one-sided RDMA and enables light-weight,
DMA-based log application on the PM nodes, simultaneously re-
ducing PM nodes’ CPU and write bandwidth consumption.
Second, PilotDB is designed to be coordination-free, even in the
presence of concurrent reads/writes to the PM log and buers,
further shaving CPU consumption on the PM nodes. This is enabled
by (1) lock-free data structures designed to manage the PM log area,
with light-weight conict check mechanisms and (2) a novel log-
pull mechanism that allows compute nodes’ query processing to
perform remote reads optimistically, with logs “read back” from
the PM side in the rare occasion of the retrieved PM-cached page
found stale, again enabled by our CDLog organization.
We implemented a PilotDB prototype atop MySQL [
23
] and
evaluated it using both industry-standard benchmarks and a pro-
duction workload. The results show that PilotDB achieves up to
98.0% of the throughput of a monolithic conguration (which is
given sucient local DRAM and PM-based storage), even with the
vast majority of its data placed on remote, disaggregated PM. With
most workloads, PilotDB signicantly outperforms LegoBase [
74
],
a state-of-the-art DRAM-disaggregated cloud-native database, and
LegoPM, a solution incorporating PM disaggregation. In addition,
we made a best-eort attempt to compare PilotDB with Aurora
and PolarDB, two mainstream cloud-native database services on
the market that adopt storage disaggregation, by allocating Au-
rora/PolarDB instances with sucient local DRAM (and careful
hardware alignment in other resource dimensions). Results show
PilotDB achieves signicantly better or comparable performance.
In addition to the above performance results, our multi-tenant
tests show that PilotDB has strong service scalability, with a 4-node
PM pool serving 32 concurrent DB instances at only a 10.8% per-
formance loss against running each instance exclusively. Moreover,
PilotDB brings instant failure recovery, up to 15.27
×
faster than the
baselines, regardless of the crash site. Finally, our cost analysis fur-
ther conrms the cost-eectiveness of PilotDB. Compared with its
closest competitor in cost-eectiveness, the PilotDB conguration
is 38.3% lower in hardware ownership cost, uses only 9.1% DRAM
across CN and PMN, and 12.5% PMN’s CPU core resources, while
delivering 91.5% higher throughput per dollar.
To our knowledge, PilotDB is the only database design that
leverages all major features of PM for disaggregation: capacity,
persistence, and RDMA-based low-latency remote accesses. Our
research contributions are as follows:
We advocate a exible 3-level cloud-native database architec-
ture with aggressively disaggregated resources. It makes CNs
of 15
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜