暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
Detecting Logic Bugs of Join Optimizations in DBMS.pdf
391
26页
26次
2023-07-11
免费下载
55
Detecting Logic Bugs of Join Optimizations in DBMS
XIU TANG, Zhejiang University, China
SAI WU
, Zhejiang University, China
DONGXIANG ZHANG, Zhejiang University, China
FEIFEI LI, Alibaba Group, China
GANG CHEN, Zhejiang University, China
Generation-based testing techniques have shown their eectiveness in detecting logic bugs of DBMS, which
are often caused by improper implementation of query optimizers. Nonetheless, existing generation-based
debug tools are limited to single-table queries and there is a substantial research gap regarding multi-table
queries with join operators. In this paper, we propose TQS, a novel testing framework targeted at detecting
logic bugs derived by queries involving multi-table joins. Given a target DBMS, TQS achieves the goal with
two key components: Data-guided Schema and Query Generation (DSG) and Knowledge-guided Query Space
Exploration (KQE). DSG addresses the key challenge of multi-table query debugging: how to generate ground-
truth (query, result) pairs for verication. It adopts the database normalization technique to generate a testing
schema and maintains a bitmap index for result tracking. To improve debug eciency, DSG also articially
inserts some noises into the generated data. To avoid repetitive query space search, KQE forms the problem as
isomorphic graph set discovery and combines the graph embedding and weighted random walk for query
generation. We evaluated TQS on four popular DBMSs: MySQL, MariaDB, TiDB and PolarDB. Experimental
results show that TQS is eective in nding logic bugs of join optimization in database management systems.
It successfully detected 115 bugs within 24 hours, including 31 bugs in MySQL, 30 in MariaDB, 31 in TiDB,
and 23 in PolarDB respectively.
CCS Concepts: Information systems
Data management systems; Structured Query Language;•Security
and privacy Database and storage security.
Additional Key Words and Phrases: Database, logic bug, join optimization.
ACM Reference Format:
Xiu Tang, Sai Wu, Dongxiang Zhang, Feifei Li, and Gang Chen. 2023. Detecting Logic Bugs of Join Optimizations
in DBMS. Proc. ACM Manag. Data 1, 1, Article 55 (May 2023), 26 pages. https://doi.org/10.1145/3588909
1 INTRODUCTION
In the past decades, we have witnessed the evolution of modern DBMS (Database Management
Systems) to support various new architectures such as cloud platforms and HTAP [
15
,
27
], which
require increasingly sophisticated optimizations for query evaluation. Query optimizer is considered
as one of the most complex and important components in DBMS. It parses the input SQL queries and
generates an ecient execution plan with the assistance of built-in cost models. The implementation
errors in a query optimizer can result in bugs, including crashes and logic bugs. Crashes are easier
Sai Wu is the corresponding author.
Authors’ addresses: Xiu Tang, tangxiu@zju.edu.cn, Zhejiang University, China; Sai Wu, wusai@zju.edu.cn, Zhejiang
University, China; Dongxiang Zhang, wusai@zju.edu.cn, Zhejiang University, China; Feifei Li, Alibaba Group, China,
lifeifei@alibaba-inc.com; Gang Chen, Zhejiang University, China, cg@zju.edu.cn.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the
full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specic permission and/or a fee. Request permissions from permissions@acm.org.
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2836-6573/2023/5-ART55 $15.00
https://doi.org/10.1145/3588909
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:2 Xiu Tang et al.
(a) MySQL’s incorrect hash join execution.
(b) MySQL’s incorrect semi-join execution.
Fig. 1. Logic bug cases of join optimizations in MySQL.
to detect as the system will halt immediately. Whereas, logic bugs are prone to be ignored, because
they simply cause the DBMS to return incorrect result sets that are hard to detect. In this paper, we
focus on detecting these silent bugs.
Pivoted Query Synthesis (PQS) has recently emerged as a promising way to detect logic bugs
in DBMS [
50
]. Its core idea is to select a pivot row from a table and generate queries that fetch
this row as the result. A logic bug is detected if the pivot row is not returned in any synthesized
query. PQS is mainly designed to support selection queries in a single table and 90% of its reported
bugs are for queries involving only one table. There still exists substantial research gap regarding
multi-table queries with dierent join algorithms and join structures, which are more error-prone
than single-table queries.
In Figure 1, we illustrate two logic bugs of MySQL for join queries. These two bugs can be
detected by our proposed tool in this paper. Figure 1(a) demonstrates a logic bug of hash join in
MySQL 8.0.18. In this example, the rst query returns the correct result set because it is executed
with the block nested loop join. However, when the second query is issued with an inner hash join,
an incorrect empty result set is returned. This is because the underlying hash join algorithm asserts
that “0” and
0” are not equal. In Figure 1(b), the logic bug is caused by semi-join processing in
MySQL’s newest version (8.0.28). In the rst query, the nested loop inner join casts the data type
𝑣𝑎𝑟𝑐ℎ𝑎𝑟
to
𝑏𝑖𝑔𝑖𝑛𝑡
and produces a correct result set. But when the second query is executed with
hash semi-join, the data type
𝑣𝑎𝑟𝑐ℎ𝑎𝑟
is converted to
𝑑𝑜𝑢𝑏𝑙𝑒
, resulting in data accuracy loss and
incorrect equivalence comparison.
Adopting query synthesis for logic bug detection in multi-table join queries is much more dicult
than that in single-table selection queries, due to two unique challenges:
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:3
Result Verication: Previous approaches adopt the dierential testing strategy to verify the
correctness of query results. The idea is to process a query using dierent physical plans. If
plans return inconsistent result sets, a possible logic bug is detected. However, the drawback
of dierential testing is two-fold. First, some logic bugs aect multiple physical plans and
make them all generate the same incorrect result. Second, when inconsistent result sets are
observed, we need manually check which plan generates the correct one, incurring high
overheads. A possible solution for the above problem is to obtain the ground-truth results for
an arbitrary testing query, which is not supported by existing tools.
Search Space: The number of join queries that can be generated from a given database
schema is exponential to the number of tables and columns. Since we are unable to enumerate
all possible queries for verication, there requires an eective query space exploration
mechanism that allows us to detect logic bugs as eciently as possible.
In this paper, we propose Transformed Query Synthesis (TQS) as a remedy. It is a novel, general,
and cost-eective tool to detect logic bugs of join optimizations in DBMS. To address the rst
challenge above, we propose the DSG (Data-guided Schema and query Generation) approach.
Given a dataset denoted as one wide table, DSG splits the dataset into multiple tables based on
detected normal forms. To speed up bug discovery, DSG also inserts some articial noise data into
the generated database. We rst convert the database schema into a graph whose nodes are the
tables/columns and edges are the relationships between the nodes. DSG adopts random walking on
schema graph to select tables for queries, and uses those tables to generate join expressions. For a
specic join query spanning over multiple tables, we can easily identify its ground-truth results
from the wide table. In this way, DSG can eectively generate (query, result) pairs for database
verication.
For the second challenge, we design the KQE (Knowledge-guided Query space Exploration)
approach. We rst extend the schema graph to a plan-iterative graph, which represents the entire
query space. Each join query is then represented as a sub-graph. KQE adopts an embedding-based
graph index to score the generated query graphs by searching whether there are structurally similar
query graphs in already-explored space. The coverage score guides the random walk generator to
explore the unknown query space as much as possible.
To demonstrate the generality and eectiveness of our approach, we evaluated TQS on four
popular DBMSs: MySQL [
41
], MariaDB [
37
], TiDB [
27
] and PolarDB [
15
]. After 24 hours of running,
TQS successfully found 115 bugs, including 31 bugs in MySQL, 30 in MariaDB, 31 in TiDB, and
23 in PolarDB. Through root cause analysis, there are 7 types of bugs in MySQL, 5 in MariaDB,
5 in TiDB, and 3 in PolarDB respectively. All the detected bugs are submitted to the respective
community and we received their positive feedbacks.
2 OVERVIEW
In this section, we formulate the problem denition and present an overview of our proposed
solution.
2.1 Problem Definition
There are two categories for database bugs: crash and logic bugs. Crash bugs are raised either by
the operating system, or by the process of DBMS. They cause DBMS to be forcefully killed or halted,
due to limited resources (e.g., out of memory) or access to an invalid memory address, etc. Crash
bugs are easy to be noticed. In contrast, the logic bugs are much harder to be noticed, because
database still runs normally and query processing returns seemingly correct results (and indeed
they return correct results in most cases, but may fetch incorrect result sets in corner cases). These
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
of 26
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。