
Unicorn: Detect Runtime Errors in Time-Series Databases with
Hybrid Input Synthesis
Zhiyong Wu
KLISS, BNRist, School of Software,
Tsinghua University, China
wuzy21@mails.tsinghua.edu.cn
Jie Liang
∗
KLISS, BNRist, School of Software,
Tsinghua University, China
liangjie.mailbox.cn@gmail.com
Mingzhe Wang
KLISS, BNRist, School of Software
Tsinghua University, China
wmzhere@gmail.com
Chijin Zhou
KLISS, BNRist, School of Software,
Tsinghua University, China
ShuimuYulin Co., Ltd, China
tlock.chijin@gmail.com
Yu Jiang
∗
KLISS, BNRist, School of Software,
Tsinghua University, China
jiangyu198964@126.com
ABSTRACT
The ubiquitous use of time-series databases in the safety-critical
Internet of Things domain demands strict security and correctness.
One successful approach in database bug detection is fuzzing, where
hundreds of bugs have been detected automatically in relational
databases. However, it cannot be easily applied to time-series
databases: the bulk of time-series logic is unreachable because of
mismatched query specications, and serious bugs are undetectable
because of implicitly handled exceptions.
In this paper, we propose Unicorn to secure time-series databases
with automated fuzzing. First, we design hybrid input synthesis
to generate high-quality queries which not only cover time-series
features but also ensure grammar correctness. Then, Unicorn uses
proactive exception detection to discover minuscule-symptom bugs
which hide behind implicit exception handling. With the specialized
design oriented to time-series databases, Unicorn outperforms the
state-of-the-art database fuzzers in terms of coverage and bugs.
Specically, Unicorn outperforms SQLsmith and SQLancer on
widely used time-series databases IoTDB, KairosDB, TimescaleDB,
TDEngine, QuestDB, and GridDB in the number of basic blocks by
21%-199% and 34%-693%, respectively. More importantly, Unicorn
has discovered 42 previously unknown bugs.
CCS CONCEPTS
• Software and its engineering
→
Software maintenance tools;
• Security and privacy → Database and storage security.
KEYWORDS
Time-series Databases, Runtime Error, Hybrid Input Synthesis
∗
Jie Liang and Yu Jiang are the corresponding authors.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
ISSTA ’22, July 18–22, 2022, Virtual, South Korea
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9379-9/22/07.. . $15.00
https://doi.org/10.1145/3533767.3534364
ACM Reference Format:
Zhiyong Wu, Jie Liang, Mingzhe Wang, Chijin Zhou, and Yu Jiang. 2022.
Unicorn: Detect Runtime Errors in Time-Series Databases with Hybrid
Input Synthesis. In Proceedings of the 31st ACM SIGSOFT International
Symposium on Software Testing and Analysis (ISSTA ’22), July 18–22, 2022,
Virtual, South Korea. ACM, New York, NY, USA, 12 pages. https://doi
.
org/
10. 1145/3533767.3534364
1 INTRODUCTION
Along with the rapid growth in Internet of Things (IoT) deployment,
time-series databases are ubiquitously used in all kinds of IoT
devices. Compared to traditional relational databases, time-series
databases employ complex logic to handle their low latency and
time-series nature. Therefore, its security, reliability, and correct-
ness are challenged by the complexity. To prevent vulnerabilities,
a common approach is writing unit tests for the target database
manually. However, unit testing is labor-consuming and cannot
detect bugs at system level.
One promising approach is fuzzing, an automated software
testing technique, which generates random data as program inputs.
It was rst developed by Miller et al. [
21
] in 1990s and has, since
then, been widely adopted in practice for nding bugs in many
critical areas, including operating systems [
14
,
34
,
37
], networking
protocols [
7
,
19
,
20
,
40
], third-part libraries [
1
,
15
–
17
,
39
]. A fuzzer
exercises the target program in a loop: (1) select an input and
generate candidate inputs based on it, (2) execute candidate inputs
to track coverage and monitors anomalies, (3) save interesting
candidate inputs which have new coverage, then go to (1). Following
the fuzzing loop, fuzzers could continuously explore more and more
state space of the target program.
Due to the easily adapted nature, fuzzing can continuously
test whole systems with little manual eort. Prior works have
successfully applied fuzzing to relational databases and discovered
many vulnerabilities. For example, SQLsmith [
31
] constructs inputs
with the abstract syntax tree (AST) model automatically and sends
them to target systems for execution. It has found more than 100
bugs in PostgreSQL, SQLite, and MonetDB since 2015 [
32
]. However,
because of the unique attributes of time-series databases, existing
fuzzing strategies are hard to directly adapt to these databases.
There are two major challenges as follows.
The rst challenge is generating grammatically-correct time-series
queries. Time-series is the basic form to organize data for time-series
251
评论