
In-Database Time Series Clustering
YUNXIANG SU, Tsinghua University, China
KENNY YE LIANG, Tsinghua University, China
SHAOXU SONG
∗
, BNRist, Tsinghua University , China
Time series data are often clustered repeatedly across various time ranges to mine frequent subsequence
patterns from dierent periods, which could further support downstream applications. Existing state-of-the-art
(SOTA) time series clustering method, such as K-Shape, can prociently cluster time series data referring
to their shapes. However, in-database time series clustering problem has been neglected, especially in IoT
scenarios with large-volume data and high eciency demands. Most time series databases employ LSM-Tree
based storage to support intensive writings, yet causing underlying data points out-of-order in timestamps.
Therefore, to apply existing out-of-database methods, all data points must be fully loaded into memory and
chronologically sorted. Additionally, out-of-database methods must cluster from scratch each time, making
them inecient when handling queries across dierent time ranges. In this work, we propose an in-database
adaptation of SOTA time series clustering method K-Shape. Moreover, to solve the problem that K-Shape
cannot eciently handle long time series, we propose Medoid-Shape, as well as its in-database adaptation
for further acceleration. Extensive experiments are conducted to demonstrate the higher eciency of our
proposals, with comparable eectiveness. Remarkably, all proposals have already been implemented in an
open-source commodity time series database, Apache IoTDB.
CCS Concepts: • Information systems
→
Database query processing; • Computing methodologies
→
Machine learning.
Additional Key Words and Phrases: time series clustering, database query processing
ACM Reference Format:
Yunxiang Su, Kenny Ye Liang, and Shaoxu Song. 2025. In-Database Time Series Clustering. Proc. ACM Manag.
Data 3, 1 (SIGMOD), Article 46 (February 2025), 26 pages. https://doi.org/10.1145/3709696
1 Introduction
Time series clustering is of great importance for analysis. For example, time series clustering
could assist pattern mining of daily stock prices in nance [
33
], serve anomalous subsequence
detection for yearly climate analysis in meteorology [
21
], facilitate the analysis of the characteristics
associated with sleep apnea [
25
] and so on. The state-of-the-art (SOTA) time series clustering
method K-Shape [
31
,
32
] can prociently cluster time series by shapes and achieve signicantly
better accuracy than other existing time series clustering methods.
However, K-Shape unfortunately faces challenges when meeting IoT scenarios, where extensive
time series data stored in databases pose serious challenges for time series clustering. On the one
hand, the arrival of IoT data is often out-of-order, due to transmission issues or sensor failures [
15
].
Most commodity time series databases employ Log Structured Merged Tree (LSM-Tree) [
30
] to
∗
Shaoxu Song (https://sxsong.github.io/) is the corresponding author.
Authors’ Contact Information: Yunxiang Su, Tsinghua University, China, suyx21@mails.tsinghua.edu.cn; Kenny Ye Liang,
Tsinghua University, China, liangy24@mails.tsinghua.edu.cn; Shaoxu Song, BNRist, Tsinghua University , China, sxsong@
tsinghua.edu.cn.
This work is licensed under a Creative Commons Attribution International 4.0 License.
© 2025 Copyright held by the owner/author(s).
ACM 2836-6573/2025/2-ART46
https://doi.org/10.1145/3709696
Proc. ACM Manag. Data, Vol. 3, No. 1 (SIGMOD), Article 46. Publication date: February 2025.
评论