ASPLOS 2023-Persistent Memory Disaggregation for Cloud-Native Relational Databases.pdf

波风水门

334

15页

7次

2023-05-31

免费下载

Persistent Memor y Disaggregation for Cloud-Native Relational

Databases

Chaoyi Ruan

rcy@mail.ustc.edu.cn

USTC, Alibaba Group

China

Yingqiang Zhang

yingqiang.zyq@alibaba-inc. com

Alibaba Group

China

Chao Bi

bc233333@mail.ustc.edu.cn

USTC, Alibaba Group

China

Xiaosong Ma

xma@hbku.edu.qa

QCRI, HBKU

Qatar

Hao Chen

ch341982@alibaba-inc.com

Alibaba Group

China

Feifei Li

lifeifei@alibaba-inc.com

Alibaba Group

China

Xinjun Yang

xinjun.y@alibaba-inc.com

Alibaba Group

China

Cheng Li

chengli7@ustc.edu.cn

USTC, Anhui Key HPC Lab

China

Ashraf Aboulnaga

aaboulnaga@hbku.edu.qa

QCRI, HBKU

Qatar

Yinlong Xu

ylxu@ustc.edu.cn

USTC, Anhui Key HPC Lab

China

ABSTRACT

The recent emergence of commodity persistent memory (PM) hard-

ware has altered the landscape of the storage hierarchy. It brings

multi-fold benets to database systems, with its large capacity, low

latency, byte addressability, and persistence. However, PM has not

been incorporated into the popular disaggregated architecture of

cloud-native databases.

In this paper, we present PilotDB, a cloud-native relational data-

base designed to fully utilize disaggregated PM resources. PilotDB

possesses a new disaggregated DB architecture that allows com-

pute nodes to be computation-heavy yet data-light, as enabled by

large buer pools and fast data persistence oered by remote PMs.

We then propose a suite of novel mechanisms to facilitate RDMA-

friendly remote PM accesses and minimize operations involving

CPUs on the computation-light PM nodes. In particular, PilotDB

adopts a novel compute-node-driven log organization that reduces

network/PM bandwidth consumption and a log-pull design that

enables fast, optimistic remote PM reads aggressively bypassing

the remote PM node CPUs. Evaluation with both standard SQL

benchmarks and a real-world production workload demonstrates

that PilotDB (1) achieves excellent performance as compared to

the best-performing baseline using local, high-end resources, (2)

signicantly outperforms a state-of-the-art DRAM-disaggregation

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior specic permission

and/or a fee. Request permissions from permissions@acm.org.

ASPLOS ’23, March 25–29, 2023, Vancouver, BC, Canada

ACM ISBN 978-1-4503-9918-0/23/03.. .$15.00

https://doi.org/10.1145/3582016. 3582055

system and the PM-disaggregation solution adapted from it, (3)

enables faster failure recovery and cache buer warm-up, and (4)

oers superior cost-eectiveness.

CCS CONCEPTS

• Computer systems organization

→

Cloud computing; •

Hardware → Non-volatile memory.

KEYWORDS

cloud-native database, persistent memory, memory disaggregation

ACM Reference Format:

Chaoyi Ruan, Yingqiang Zhang, Chao Bi, Xiaosong Ma, Hao Chen, Feifei

Li, Xinjun Yang, Cheng Li, Ashraf Aboulnaga, and Yinlong Xu. 2023. Per-

sistent Memory Disaggregation for Cloud-Native Relational Databases. In

Proceedings of the 28th ACM International Conference on Architectural Sup-

port for Programming Languages and Operating Systems, Volume 3 (ASPLOS

’23), March 25–29, 2023, Vancouver, BC, Canada . ACM, New York, NY, USA,

15 pages. https://doi. org/10.1145/3582016. 3582055

1 INTRODUCTION

The past decade has witnessed the emergence and growth of dis-

aggregated cloud-native databases, with successful systems such

as Amazon Aurora [

], Alibaba PolarDB [

–

], and Microsoft

Socrates [

], spanning their processing across multiple layers of

network-connected resource pools (Figure 1). The compute nodes

(CNs) host the computation logic, leveraging remote but shared

DRAM space as an extension to the local buer pool for memory ca-

pacity, and the replicated storage pool for data persistence and fault

tolerance. For users, the rich, elastic, and on-demand conguration

of disaggregated cloud-native databases cater to their diverse work-

load requirements and exible scaling needs. For service providers,

ASPLOS ’23, March 25–29, 2023, Vancouver, BC, Canada C. Ruan, Y. Zhang, C. Bi, X. Ma, H. Chen, F. Li, X. Yang, C. Li, A. Aboulnaga, and Y. Xu

such services allow the reuse of database software infrastructures

as well as the consolidation/sharing of hardware resources.

SQL/TXN engine

Compute node (CN)

Memory node (MN)

Log

Storage node (SN)

Write

Append

DRAM buffer pool

Write

Update

Write

Read

Cache

Log

Data pages

Figure 1: Sample disaggregated cloud-native DB architecture

spanning three layers: CPU, memor y, and storage

Though such new architecture enables the independent scaling

of resources, there remain major constraints impeding its adoption.

First, DRAM disaggregation faces its limited per-machine density,

high (and uctuating) price [

], and volatility, making it a more

costly and less reliable layer for hosting a cloud-native database’s

working set. Second, writes remain slow, especially with transac-

tions, as changes need to be persisted in time to the storage layer.

In this work, we argue that Persistent Memory (PM), also known

as non-volatile memory (NVM), driven by diverse technologies

such as 3D XPoint [

], BiCS Flash [

], and PCM [

], emerges

as an appealing layer for resource disaggregation. Compared with

DRAM, PM oers higher provisioning density (e.g., one DIMM slot

can hold 512

Optane PM, but only 128

DDR4 DRAM). It

simultaneously oers persistence, enabling fast writes and recov-

ery. In addition, PM preserves ultra-low-latency remote access via

RDMA, an advantage over fast SSDs. Such multi-fold capability

makes PM an ideal candidate for disaggregated databases, as we can

simultaneously cache hot pages and persist log data on a shared and

distributed PM layer. This brings on-demand, cost-eective memory

buer expansion, fast data persistence, and enhanced availability.

However, existing PM disaggregation work has not fully consid-

ered database redesign to utilize the versatile PM units, focusing

instead on supporting native data structures [

] or simple applica-

tions like KV stores [

]. Applying these solutions to cloud-native

databases could easily lead to new bottlenecks on the shared, re-

mote PM nodes (PMNs). The rst is the tension between the limited

PM write bandwidth [

] and the heavy bandwidth con-

sumption of existing solutions. The latter is largely due to writing

redundancy/amplication caused by logging and dirty data ushing.

Ooading log management to the PM side would reduce the PM

bandwidth pressure (by not sending dirty pages but reproducing

them by PM-side log application). On the other hand, this comes at

the price of heavy CPU involvement on the PM nodes, required to

handle ooaded data (especially their updates) and coordinate con-

current data accesses. Finally, complex management logic on the PM

side would complicate the critical-path reads and writes. Both these

PM-side bottlenecks (write bandwidth and CPU), unfortunately,

conict directly with the main selling point of PM disaggregation

for the cloud: having a shared PM node pool supporting many

compute nodes running database instances.

To address these challenges, we propose PilotDB, a novel PM-

disaggregated cloud-native database architecture featuring the fol-

lowing innovations.

First, PilotDB embodies CDLog (Compute-node-Driven Logging),

a central logging mechanism that eciently ooads bulk data to

the PM layer as a large, fast page buer, yet with light computation

there to support speedy logging and update handling. While retain-

ing page-based data organization of relational databases, it discards

the conventional page-based WAL organization and instead adopts

ne-grained, physical logging, where data entries directly embed

changes at a mini-page granularity as well as concerned remote PM

memory addresses. This allows compute nodes only to ush CDLog

entries to remote PM via one-sided RDMA and enables light-weight,

DMA-based log application on the PM nodes, simultaneously re-

ducing PM nodes’ CPU and write bandwidth consumption.

Second, PilotDB is designed to be coordination-free, even in the

presence of concurrent reads/writes to the PM log and buers,

further shaving CPU consumption on the PM nodes. This is enabled

by (1) lock-free data structures designed to manage the PM log area,

with light-weight conict check mechanisms and (2) a novel log-

pull mechanism that allows compute nodes’ query processing to

perform remote reads optimistically, with logs “read back” from

the PM side in the rare occasion of the retrieved PM-cached page

found stale, again enabled by our CDLog organization.

We implemented a PilotDB prototype atop MySQL [

] and

evaluated it using both industry-standard benchmarks and a pro-

duction workload. The results show that PilotDB achieves up to

98.0% of the throughput of a monolithic conguration (which is

given sucient local DRAM and PM-based storage), even with the

vast majority of its data placed on remote, disaggregated PM. With

most workloads, PilotDB signicantly outperforms LegoBase [

a state-of-the-art DRAM-disaggregated cloud-native database, and

LegoPM, a solution incorporating PM disaggregation. In addition,

we made a best-eort attempt to compare PilotDB with Aurora

and PolarDB, two mainstream cloud-native database services on

the market that adopt storage disaggregation, by allocating Au-

rora/PolarDB instances with sucient local DRAM (and careful

hardware alignment in other resource dimensions). Results show

PilotDB achieves signicantly better or comparable performance.

In addition to the above performance results, our multi-tenant

tests show that PilotDB has strong service scalability, with a 4-node

PM pool serving 32 concurrent DB instances at only a 10.8% per-

formance loss against running each instance exclusively. Moreover,

PilotDB brings instant failure recovery, up to 15.27

faster than the

baselines, regardless of the crash site. Finally, our cost analysis fur-

ther conrms the cost-eectiveness of PilotDB. Compared with its

closest competitor in cost-eectiveness, the PilotDB conguration

is 38.3% lower in hardware ownership cost, uses only 9.1% DRAM

across CN and PMN, and 12.5% PMN’s CPU core resources, while

delivering 91.5% higher throughput per dollar.

To our knowledge, PilotDB is the only database design that

leverages all major features of PM for disaggregation: capacity,

persistence, and RDMA-based low-latency remote accesses. Our

research contributions are as follows:

•

We advocate a exible 3-level cloud-native database architec-

ture with aggressively disaggregated resources. It makes CNs

of 15

免费下载

关注

评论