暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
Unicorn Detect Runtime Errors in Time-Series Databases with Hybrid Input Synthesis.pdf
73
12页
0次
2023-12-28
免费下载
Unicorn: Detect Runtime Errors in Time-Series Databases with
Hybrid Input Synthesis
Zhiyong Wu
KLISS, BNRist, School of Software,
Tsinghua University, China
wuzy21@mails.tsinghua.edu.cn
Jie Liang
KLISS, BNRist, School of Software,
Tsinghua University, China
liangjie.mailbox.cn@gmail.com
Mingzhe Wang
KLISS, BNRist, School of Software
Tsinghua University, China
wmzhere@gmail.com
Chijin Zhou
KLISS, BNRist, School of Software,
Tsinghua University, China
ShuimuYulin Co., Ltd, China
tlock.chijin@gmail.com
Yu Jiang
KLISS, BNRist, School of Software,
Tsinghua University, China
jiangyu198964@126.com
ABSTRACT
The ubiquitous use of time-series databases in the safety-critical
Internet of Things domain demands strict security and correctness.
One successful approach in database bug detection is fuzzing, where
hundreds of bugs have been detected automatically in relational
databases. However, it cannot be easily applied to time-series
databases: the bulk of time-series logic is unreachable because of
mismatched query specications, and serious bugs are undetectable
because of implicitly handled exceptions.
In this paper, we propose Unicorn to secure time-series databases
with automated fuzzing. First, we design hybrid input synthesis
to generate high-quality queries which not only cover time-series
features but also ensure grammar correctness. Then, Unicorn uses
proactive exception detection to discover minuscule-symptom bugs
which hide behind implicit exception handling. With the specialized
design oriented to time-series databases, Unicorn outperforms the
state-of-the-art database fuzzers in terms of coverage and bugs.
Specically, Unicorn outperforms SQLsmith and SQLancer on
widely used time-series databases IoTDB, KairosDB, TimescaleDB,
TDEngine, QuestDB, and GridDB in the number of basic blocks by
21%-199% and 34%-693%, respectively. More importantly, Unicorn
has discovered 42 previously unknown bugs.
CCS CONCEPTS
Software and its engineering
Software maintenance tools;
Security and privacy Database and storage security.
KEYWORDS
Time-series Databases, Runtime Error, Hybrid Input Synthesis
Jie Liang and Yu Jiang are the corresponding authors.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
ISSTA ’22, July 18–22, 2022, Virtual, South Korea
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9379-9/22/07.. . $15.00
https://doi.org/10.1145/3533767.3534364
ACM Reference Format:
Zhiyong Wu, Jie Liang, Mingzhe Wang, Chijin Zhou, and Yu Jiang. 2022.
Unicorn: Detect Runtime Errors in Time-Series Databases with Hybrid
Input Synthesis. In Proceedings of the 31st ACM SIGSOFT International
Symposium on Software Testing and Analysis (ISSTA ’22), July 18–22, 2022,
Virtual, South Korea. ACM, New York, NY, USA, 12 pages. https://doi
.
org/
10. 1145/3533767.3534364
1 INTRODUCTION
Along with the rapid growth in Internet of Things (IoT) deployment,
time-series databases are ubiquitously used in all kinds of IoT
devices. Compared to traditional relational databases, time-series
databases employ complex logic to handle their low latency and
time-series nature. Therefore, its security, reliability, and correct-
ness are challenged by the complexity. To prevent vulnerabilities,
a common approach is writing unit tests for the target database
manually. However, unit testing is labor-consuming and cannot
detect bugs at system level.
One promising approach is fuzzing, an automated software
testing technique, which generates random data as program inputs.
It was rst developed by Miller et al. [
21
] in 1990s and has, since
then, been widely adopted in practice for nding bugs in many
critical areas, including operating systems [
14
,
34
,
37
], networking
protocols [
7
,
19
,
20
,
40
], third-part libraries [
1
,
15
17
,
39
]. A fuzzer
exercises the target program in a loop: (1) select an input and
generate candidate inputs based on it, (2) execute candidate inputs
to track coverage and monitors anomalies, (3) save interesting
candidate inputs which have new coverage, then go to (1). Following
the fuzzing loop, fuzzers could continuously explore more and more
state space of the target program.
Due to the easily adapted nature, fuzzing can continuously
test whole systems with little manual eort. Prior works have
successfully applied fuzzing to relational databases and discovered
many vulnerabilities. For example, SQLsmith [
31
] constructs inputs
with the abstract syntax tree (AST) model automatically and sends
them to target systems for execution. It has found more than 100
bugs in PostgreSQL, SQLite, and MonetDB since 2015 [
32
]. However,
because of the unique attributes of time-series databases, existing
fuzzing strategies are hard to directly adapt to these databases.
There are two major challenges as follows.
The rst challenge is generating grammatically-correct time-series
queries. Time-series is the basic form to organize data for time-series
251
ISSTA ’22, July 18–22, 2022, Virtual, South Korea Z.Wu, J.Liang, M.Wang, C.Zhou, Y.Jiang
databases, but existing fuzzers are hard to generate grammatically-
correct time-series queries to test. Specically, time-series data [
8
,
9
]
represents a collection of data values observed from sequential
measurements over time. To improve the eciency to store and
fetch data, time-series databases employ dierent strategies from
relational databases to t the time-series storage. However, lacking
time-series specications of time-series databases, existing fuzzing
strategies are hard to generate grammatically-correct time-series
queries. Specically, due to the vast dierence between time-series
in IoT domain and relations in SQL, traditional relational database
fuzzers (e.g. SQLsmith [
31
]) can hardly reach time-series logic. In
addition, the queries accepted by time-series databases are highly-
structured, the strict grammar impedes most of the seeds generated
by random mutation in conventional mutation-based fuzzers (e.g.
AFL [
15
]). As a result, designing a time-series input generation
mechanism, which generates grammatically correct time-series
queries, to explore the time-series logic is needed.
The second challenge is capturing exceptions handled implicitly.
Crashes are used as an indication for failed tests in fuzzing, however,
time-series databases utilize implicit exception handling to prevent
crashing whole systems for usability and reliability. In other words,
when anomalies do not happen in critical locations of the server,
they are handled implicitly and no crashes could be triggered.
For example, time-series databases usually create a new thread
for each connecting client as the worker. When an exception is
thrown inside the thread, the implicit handling mechanism will
automatically capture it and only inform the worker with a fault
message. Therefore, the server could still preserve a normal running
state. However, these exceptions may contain serious bugs and
they will be ignored by existing fuzzing approaches. As a result,
designing an implicitly handled exception detection scheme, which
directly obtains exception messages to determine whether it is an
anomaly, to capture all possible bugs is required.
In this paper, we propose Unicorn to overcome the challenges
through hybrid input synthesis and proactive exception detection.
In order to generate grammatically-correct time-series queries,
hybrid input synthesis combines the syntax-preserved mutation and
time-series guided mutation. Specically, we design hybrid input
specication, which combines the rules to generate conventional
SQLs and time-series SQLs in time-series databases. Based on the
specication, Unicorn rst constructs the abstract syntax tree
(AST) for the original seeds and generates new time-series queries
by changing the time-series nodes of AST. To detect exceptions
handled implicitly, proactive exception detection directly captures
exception information from the runtime environment and analyzes
whether it is an anomaly. Specically, instead of passively receiving
the program’s state, Unicorn inserts an agent into each process
to proactively catch the exceptions and send them to the anomaly
detector for analyzing and reporting.
For evaluation, we used Unicorn to perform fuzzing on IoTDB,
KairosDB, QuestDB, TimescaleDB, TDEngine, and GridDB. We also
adapted the industrial fuzzers SQLancer [
25
] and SQLsmith [
31
]
for comparison. Unicorn covered 115.75% more basic blocks on
average than the best results of other fuzzers. In addition, Unicorn
detected 42 previously unknown bugs.
In conclusion, our paper makes the following contributions:
We observe that current fuzzing approaches are hard to ef-
fectively test time-series databases. The two main challenges
are generating grammatically correct queries and capturing
exceptions handled implicitly.
We propose hybrid input synthesis and proactive exception
detection to address the aforementioned challenges. We also
implement these approaches in Unicorn.
We evaluate Unicorn on 6 popular time-series databases
against state-of-the-art fuzzers SQLsmith and SQLancer.
The results show that Unicorn outperforms others and 42
previously-unknown bugs are detected.
2 TIME-SERIES DATABASES
As an infrastructure for IoT data storage and analysis, time-series
databases play an important role in promoting the development of
Internet of Things. Generally, the time-series database is a kind of
large-scale software to manipulate and manage IoT data, it handles
the operation requests from various clients (including IoT devices,
PC, etc.), and carries out unied management and control to ensure
the security and integrity of IoT data [
18
]. In embedded application
scenarios, time-series databases usually have the following two
characteristics: 1) They employ time-series data to meet scenarios
in the IoT domain, and 2) They utilize implicit exception handling to
guarantee usability, namely, they limit the impacts of anomalies by
handling them internally to ensure the server always runs normally.
Root
vehicle
d1
speed status temperature
robot
d2 r1
status
Storage Group
Device
Sensor
>set storage group to root.vehicle
> create timeseries root.vehicle.d1.speed
with datatype=BOOLEAN,encoding=PLAIN
mobile
m1
height
Figure 1: The time-series query along with the corresponding
storage model of Apache IoTDB. The query imports new
keywords related to time-series. In addition, the object has
hierarchical structures because of the tree-based schema
in IoTDB. IoT data is stored in a tree-based schema,
and the attribute hierarchy structure has three layers. A
grammatically correct object name should construct a path
from the root node to a leaf node.
.
2.1 Employing Time-Series Data
Compared to other applications, the major characteristic of the IoT
applications is employing time-series data. Time series data [
8
,
9
]
252
of 12
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论