
World Wide Web
accessed or not for a given time interval. We elaborate how to integrate our method into
MySQL, an open-sourced RDBMS, and propose a declarative way to fetch the incremen-
tal data. Several optimization techniques are proposed to boost the extraction performance.
Extensive experiments are conducted over the standardized Sysbench benchmark to show
that our proposed method is robust and efficient.
Keywords RDBMS · MVCC · Incremental data extraction · Snapshot
1 Introduction
Account reconciliation is of great importance to corporate service charging and banking
deposit systems. For example, in the Tencent’s service charging system, a registered QQ
[14]orWeChat[22] user can recharge his or her account so that he or she can utilize the paid
business offerings that the system provides, like purchasing game equipments, watching
digital fee TV, etc. From the perspective of the service charging systems, any access to the
paid business offerings must guarantee that the account reconciliation is correct, i.e., any
change of the account balances must be in consistency with the expense statements. To do
this, the system is required to do regular account reconciliation and reports the daily, weekly,
or monthly balance.
In some cases, however, due to the unpredictable system failures, results of doing the
account reconciliation could be incorrect, i.e., changes of the account balances is not in
consistency with the expense statements. Account imbalance can bring serious harms to
financial systems, which could result in a loss of money or even a loss of customers. In this
case, it is of great importance to dig out the harmful behaviors by tracing back to the trans-
actions in the expense statements that possibly destroy the account balances. To this end, in
this paper, we investigate the problem, namely time-interval data extraction, which is infor-
mally defined to extract the incremental data that are valid between a given time interval. By
separating the time into a sequence of intervals and doing the account reconciliation with
the help of the time-interval data extraction, once an account reconciliation in a time interval
is checked to be incorrect, then transactions that occur in this time period are considered as
suspicious transactions. For example, if we find the suspicious transactions occurring dur-
ing 10:00 am and 11:00 am in Sep. 17, 2017, then we simply extract the transactions during
10:00 am and 11:00 am in Sep. 17, 2017 and then do account reconciliation to figure out
the harmful transactions.
Execution of the time-interval data extraction tasks in RDBMS is not trivial. The reason
is two-fold. First, implementation of time-interval data extraction should be transparent to
users. One may think that by building an auxiliary relation recording all updates of account
balance and the update time, it is able to do time-interval extraction by checking the update
time based on the auxiliary relation. However, it not only incurs additional maintenance
cost and modifies the database schema, but also degrades the system throughput since any
update operation on the original relations will result in a cascading update of the auxiliary
relation. Second, execution of the time-interval data extraction tasks should be easy to use.
Consider that SQL has become a standard query language in existing commercial and open
source RDBMS. Using SQL to execute the time-interval data extraction tasks is preferred.
We investigate the problem of doing interval data extraction in MVCC-based RDBMS.
MVCC is widely adopted as a concurrency control mechanism in existing RDBMS, includ-
ing Oracle, MySQL and PostgreSQL [19], etc. In MVCC mechanism, (1) each record is
相关文档
评论