暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
T-SQL A Lightweight Implementation to Enable Built-in Temporal Support in MVCC-based RDBMSs——IEEETKDE.pdf
237
14页
0次
2021-11-29
免费下载
1041-4347 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2021.3081717, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1
T-SQL: A Lightweight Implementation to Enable
Built-in Temporal Support in MVCC-based
RDBMSs
Zhanhao Zhao, Wei Lu, Hongyao Zhao, Zongyan He, Haixiang Li, Anqun Pan, Xiaoyong Du
Abstract—The adoption of temporal expressions into SQL:2011 has continuously driven the extensions of temporal support in
relational database systems (a.b.a. RDBMSs). In this paper, we present T-SQL, a lightweight yet efficient built-in temporal
implementation in RDBMSs. T-SQL entirely relies on multi-version concurrency control (MVCC), widely adopted in RDMBSs, to
manage temporal data. For temporal data, current records are maintained in legacy databases, and historical records, i.e., previous
versions of current records (if any), which used to be periodically reclaimed are separately maintained in KV stores. To enable temporal
query processing under SQL:2011, we extend the query engine in legacy RDBMSs to support query processing over either current
records or historical records or both. Further, regarding temporal data are ever-increasing, we propose various optimizations to reduce
the storage overhead of KV stores while keeping efficient query performance. We elaborate a publicly available implementation on how
to integrate T-SQL into both centralized and distributed RDBMSs. We conduct extensive experiments on both YCSB and TPC-series
benchmarks by comparing T-SQL with other temporal database systems. The results show that T-SQL is both lightweight and efficient.
Index Terms—RDBMS, Temporal Database, MVCC, SQL:2011, KV Store
F
1 INTRODUCTION
I
T is of great importance to maintain not only currently valid
data, but also the history of all data changes. Typical examples
include forensic analysis and legal requirements to store data for
a certain number of years, retrospective and trend data analysis
to offer the asset use details as of two years ago, and logical
corruption recovery to rewind tuples/relations/databases as of a
particular point in time [32].
The study on temporal data management has been going on for
decades, but only recently has some progress been made. Temporal
data management by applications brings prohibitively expensive
development and maintenance overhead. Instead, extensions to
support temporal data management in conventional RDBMSs have
been extensively explored. Nevertheless, the temporal support of-
fered by commercially available software tools and systems is still
quite limited. Until recently, the adoption of temporal expressions
into SQL:2011 makes the major DBMSs start to provide built-
in temporal support. The temporal extensions typically include:
(1) extension of non-temporal relations to temporal relations, (2)
unified management of both current data and historical data in a
single database, (3) query rewrite functionality by expressing the
semantics of temporal queries in equivalent conventional (non-
temporal) SQL statements.
In SQL:2011, the temporal features mainly include temporal
relation definitions, temporal queries, and others (e.g., temporal
constraints). As compared to the non-temporal counterpart, a
Zhanhao Zhao, Wei Lu, Hongyao Zhao, Zongyan He, and Xiaoyong
Du are with the Key Laboratory of Data Engineering and Knowledge
Engineering, Ministry of Education, China, and School of Information,
Renmin University of China, China.
E-mail: {zhanhaozhao,lu-wei,hongyaozhao,zongyanhe,duyong}@ruc.edu.cn
Haixiang Li and Anqun Pan are with Tencent Inc., China.
E-mail: {blueseali,aaronpan}@tencent.com
TABLE 1
Gaming player account balance
ID Player Bal Valid Time Transaction Time
r
1.1
James 0 [2018-05-20,2018-10-21) [2018-05-20,2018-10-21)
r
1.2
James 50 [2018-10-21,2018-11-01) [2018-10-21,2018-11-01)
r
1.3
James 1000 [2018-11-01,2018-11-12) [2018-11-01,2018-11-12)
r
1.4
James 2000 [2018-11-12,2019-05-11) [2018-11-12,)
r
2.1
David 150 [2018-10-20,2018-10-20) [2018-10-20,2018-10-20)
r
2.2
David 200 [2018-10-20,2019-04-19) [2018-10-20,)
r
3.1
Jack 200 [2018-11-08,2019-05-07) [2018-11-08,)
temporal relation associates either valid time, or transaction time,
or both. Either valid time or transaction time is a closed-open
period (i.e., time interval) [s, t), with s as start time and t as end
time. Valid time is a time period during which a fact was/is/will be
true in reality, and transaction time is a time period during which
a fact is/was recorded (i.e., current/historical) in the database.
Example 1. Table 1 shows an example of the account balance for
gaming players. Each player has one current record and perhaps
one or several historical records. For example, r
1.4
, r
2.2
, r
3.1
are
current records of players James, David, and Jack, respectively,
while the remaining are the historical records of either James or
David. Assume each recharge increases the validity of account
balance by 6 months, e.g., the valid time of current record r
1.4
is
[2018-11-12,2019-05-11). Besides, the transaction time of each
record r indicates the timestamps on which r is created and
deleted/updated if any (otherwise the timestamp is set to ). For
instance, it can be observed that r
1.4
is created on ‘2018-11-12’
and has not yet been deleted/updated.
Thus far, we have witnessed a big burst of temporal support
Authorized licensed use limited to: Tencent. Downloaded on July 05,2021 at 16:23:13 UTC from IEEE Xplore. Restrictions apply.
1041-4347 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2021.3081717, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2
in legacy database systems, including Oracle [4], Teradata [8],
MariaDB [3], SQL Server [6]. However, they suffer from either
limited expressiveness or poor performance, or both. First, the
temporal data model is inadequate. It is often reported that
DBAs are fired or forced to run away due to an accidental
deletion of valuable business data. This kind of unexpectedly
committed transactions, namely logical data corruption, is difficult
to avoid in real applications. Yet, the temporal data model defined
in SQL:2011 does not explicitly support logical data corruption
recovery. Technically, to address this issue, it is necessary to
extend the temporal data model, based on which we first figure
out the erroneous transaction, and then apply a sequence of
reverse operations over the records involved in the erroneous
transaction. Second, the performance of legacy databases suffers
from the synchronization overhead by introducing the built-in
temporal support. Existing temporal implementations store his-
tory and current records separately so as to skip over historical
records for current data query processing, which is deemed to
the dominant temporal queries. However, this separation degrades
the throughput of conventional transactional workloads because
any update/delete of a record r will cause a synchronization that
transfers r from the current relation to the historical relation. In
this way, any update/delete of a record in the current relation will
require exclusive access to the historical relation, leading to a
significant drop in the whole system’s throughput. Third, temporal
data are maintained in an append-only mode, causing an ever-
increasing size. The overhead of maintaining a large volume of
temporal data degrades the query performance, and hence it is
of great necessity to develop a scalable storage engine to enable
efficient query processing.
To address the above issues, in this paper, we propose T-SQL,
a lightweight yet efficient built-in temporal implementation. We
make the following contributions.
We present a new temporal data model. As compared to the non-
temporal counterpart, besides the valid/transaction time period
defined in SQL:2011, a temporal relation under the new model
has two transaction IDs. One ID corresponds to the transaction that
creates the record, and the other ID corresponds to the transaction
that deletes/updates the record. By introducing the transaction IDs,
it is able to identify all records that are created/updated/deleted
in the same transaction, thus achieving the recovery of logical
data corruption. More importantly, temporal join queries taking
the transaction time as the join key can be enhanced by taking the
transaction ID as the join key instead.
We propose a built-in temporal implementation with various
optimizations, and encapsulate it into PostgreSQL [5] and Green-
plum [2]. We have released our implementation publicly available.
First, our implementation completely relies on the MVCC
mechanism and is lightweight. Like other temporal implemen-
tations in legacy RDBMSs, we manage current and historical
records separately as well. In legacy RDBMSs, any update/delete
of a current record results in synchronous migration of newly
generated historical records to the historical relation. Instead,
we propose an asynchronous data migration strategy, i.e., any
update/delete of a current record does not cause an immediate data
migration. In our design, all newly generated historical records
are migrated to the historical relation only when the database
system starts to reclaim the storage occupied by records that
are deleted or obsoleted, which is also known as VACUUM in
PostgreSQL and PURGE in MySQL. This late data migration
transfers historical data in batch and is almost non-invasive to the
originally legacy RDBMSs. Further, we utilize the KV store with
various optimizations to organize historical relations, in which
only data changes of historical records of the same entity are
maintained, thus reducing the size of storage space.
Second, in response to the challenge that the transaction time
of each record is expensive to set in legacy RDBMSs, we build an
efficient transaction status manager. It maintains the status of each
transaction, including the transaction commit time, in a transaction
log. A special design of the manager makes the retrieval of the
commit time of a given transaction ID at most one I/O cost. During
the data migration, we update their transaction time for each newly
generated historical record based on its transaction ID. For the
current and historical records that have not yet been transferred to
the historical relation, we are still able to obtain the transaction
time based on their transaction IDs via the manager.
Third, to support temporal query processing, we extend the
parser, query executor, and storage engine of legacy RDBMSs.
Our temporal implementation supports all temporal features de-
fined in SQL:2011. For valid-time qualifiers in the temporal
query, we transform the temporal operations into equivalent non-
temporal operations; while for transaction-time qualifiers, we
provide a native operator to retrieve current and historical data
with various optimizations.
We conduct extensive experiments on both real and synthetic
benchmarks by comparing T-SQL with Oracle, SQL Server, and
MariaDB. The results show that T-SQL almost has minimal per-
formance loss (only 7% on average) by introducing the temporal
features, and performs the best for most of the temporal queries.
The rest of the paper is organized below. Section 2 discusses
related work. Section 3 formalizes our new temporal data model.
Section 4 outlines the system architecture. Section 5 elaborates
temporal query processing and storage management. Section 6
presents the implementation. Section 7 reports the experimental
results, and Section 8 concludes the paper.
2 RELATED WORK
The study on temporal data management has been going on for
decades, mainly in the fields of data model development, query
processing, and implementations.
Early work until 1990s mainly focused on the consensus
glossary of concepts for data modeling [16], [17], [18], [19],
[20]. At this stage, the contributions mainly include temporal
relation definitions, temporal constraints, and temporal queries. As
compared to the non-temporal counterpart, a temporal relation as-
sociates time, which is multi-dimensional, and can be either valid
time or transaction time, or other types of time. The semantics of
integrity constraints in the temporal data model is also enriched
[12], [39]. Entity integrity does not enforce the uniqueness of the
primary key. Instead, it requires that no intersection exists between
valid times of any two records with the same primary key; while
for reference integrity, there must exist one matching record in
the parent relation whose valid time contains the valid time of the
child record. As compared to the regular query syntax, temporal
queries are formulated by expressing filtering conditions as period
predicates [10], [25].
After attempts with many years to build the implementation
on top of legacy RDBMSs, such as Oracle [4], DB2 [1], and
Ingres [40], it is well recognized that the cost, brought by
the development and maintenance of application programs, is
prohibitively expensive. For this reason, since the late 1990s,
Authorized licensed use limited to: Tencent. Downloaded on July 05,2021 at 16:23:13 UTC from IEEE Xplore. Restrictions apply.
of 14
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜