
Towards Millions of Database Transmission Services in the Cloud
[Industrial Track]
Hua Fan*, Dachao Fu*, Xu Wang, Jiachi Zhang, Chaoji Zuo, Zhengyi Wu, Miao Zhang,
Kang Yuan, Xizi Ni, Guocheng Huo, Wenchao Zhou, Feifei Li, Jingren Zhou
Alibaba Group
Hangzhou, China
{guanming.fh,qianzhen.fdc,wx105683,zhangjiachi.zjc,zuochaoji.zcj,wuzhengyi.wzy,yanmen.zm}@alibaba-inc.com
{yuankang.yk,xizi.nxz,guocheng.hgc,zwc231487,lifeifei,jingren.zhou}@alibaba-inc.com
Abstract
Alibaba relies on its robust database infrastructure to facilitate real-
time data access and ensure business continuity despite regional
disruptions. To address these operational imperatives, Alibaba de-
veloped the Data Transmission Service (DTS), which has become
critical for internal applications and public cloud services alike.
This paper presents a comprehensive study of the architectural
innovations, resource scheduling mechanisms, and performance
optimization strategies that have been implemented within DTS to
tackle the signicant challenges of cross-network, heterogeneous
data transmission in a cost-eective manner. We explore the novel
Any-to-Any (A2A) architecture, which simplies the complexity
of data paths between diverse databases and mitigates network
connectivity issues, thereby signicantly reducing development
overhead. Additionally, we examine a dynamic network bandwidth
scheduling algorithm that eectively maintains Service-Level Ob-
jectives (SLOs), complemented by a serverless mechanism that
ensures ecient resource utilization. Furthermore, DTS utilizes
advanced strategies such as transaction dependency tracking, hot
data consolidation, and batching to enhance synchronization per-
formance and eciency. DTS has distilled the lessons learned from
years of serving our customer base and currently supports nearly
1 million public cloud instances annually. Our evaluation results
show that DTS can eectively and eciently handle real-time data
transmission in both experimental and production environments.
1 Introduction
Alibaba operates a vast digital commerce service, anchored by its
resilient database services, which store essential business data. This
requires two key functions: First is real-time access to database
information, critical for applications like advertising and search,
prompting the need for services that can parse real-time database
logs to satisfy the many business units demanding instant data
from primary databases [
22
,
28
,
41
]. Secondly, business continuity
against regional disruptions — such as power outages or natural dis-
asters — is pivotal, demanding real-time database synchronization
to secondary regions for swift operation transfer [
32
]. These neces-
sities drove Alibaba to create its own Data Transmission Service
(DTS) [
20
] in 2011, focusing on synchronization between databases
(e.g., MySQL to MySQL).
*Both authors contributed equally to this research.
As Alibaba Cloud Computing expanded, it began oering a vari-
ety of database services to the public cloud, triggering a need for
migrating more than 24 dierent types of databases from local data
centers or other cloud providers. This diversity led to a surplus of
potential data transmission pathways, heavily complicating the pro-
cess and increasing the development workload. Network connectiv-
ity issues further exacerbated this complexity, potentially requiring
specialized programs to access private intranets, culminating in a
signicant challenge: developing numerous cross-network,
heterogeneous data transmission services cost-eectively.
Managing a high volume of DTS instances poses the challenge
of resource scheduling. This complexity arises from the need to
balance and allocate network and computational resources eec-
tively such as bandwidth, CPU, and memory among a multitude
of services. Insucient resource allocation can lead to violations
of Service Level Objectives (SLOs), adversely aecting customer
business operations. As the demand for real-time access to data
grows, ensuring ecient resource scheduling becomes critical for
maintaining service quality and reliability in DTS operations.
The third challenge that emerges is related to synchroniza-
tion performance issues. This concerns the need for near-zero
delay in real-time data synchronization, which is highly sought
after by our customers. However, synchronization latency can be
signicantly aected by the performance of the target database,
particularly under high-frequency updates. Factors contributing
to this delay include lower concurrency in database replication
compared to the source [
28
], performance discrepancies in updates
between heterogeneous databases [
18
], and ineciencies in writing
to the target database.
To conquer these challenges while meeting customer and busi-
ness needs, DTS was architected with several key design consid-
erations. In this paper, we outline the architecture aimed at reduc-
ing development complexity, resource scheduling mechanisms for
enhancing user experience and eciency, and optimizations for
performance of update operations. The specics are as follows.
•
DTS employs an Any-to-Any (A2A) architecture, which is a
strategic design choice that allows for universal compatibility
and exibility in data transmission. This A2A approach enables
DTS to interconnect any source database with any target data-
base, transforming and translating data formats as needed. By
adopting this architecture, DTS can reduce the number of poten-
tial data transmission pathways from a factorial of M source-to-N
target links to a M+N conguration. On each link, DTS encap-
sulates network connectivity issues into predened scenarios
within the DTS framework. Users can thus select their scenario
文档被以下合辑收录
评论