JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 2
detection. Some methods rely on metrics to detect anomalies
by probability density estimation, outlier detection, or autore-
gressive bias (more details are introduced in subsection II-A).
Other methods rely on logging to detect anomalies through
keyword matching or learning-based methods (more details
are introduced in subsection II-B). However, in the context
of microservice systems, relying on only one single modal
data is insufficient. It is unable to depict the status of systems
preciously to judge whether anomalies occur [1], [2]. Existing
unsupervised multi-modal anomaly detection methods mainly
rely on reconstruction-based approaches [3], assuming that
normal data is easier to reconstruct than anomalous data. These
methods learn the distribution of normal data by fitting unla-
beled data and identifying anomalies based on reconstruction
errors.
However, these existing reconstruction-based multi-modal
anomaly detection methods struggle to distinguish between
hard and anomalous samples in the decision space, leading
to limited anomaly detection performance. In our work, hard
samples refer to normal hard samples, which are a part of
normal samples, but are difficult for the model to classify
correctly and can be easily classified as abnormal samples [4],
[5]. The underlying reasons for this limitation are twofold:
(1) Complex Combination. In microservice systems, the
combination of normal samples is complex. (one log pattern
can be combined with multiple metric patterns to represent
normal system states, and similarly, one metric pattern can also
be combined with multiple log patterns to represent normal
system states). In consideration that hard samples are part
of normal samples, the combination of hard samples is also
complex. Existing reconstruction-based methods fail to capture
diverse and complex combinations of hard samples, which may
identify these hard samples as anomalies. (2) Inconsistent
Convergence Speed. As shown in Fig.1, the convergence
speed of simple and hard samples in multi-modal data are
inconsistent, leading to models overfitting simple samples and
underfitting hard samples, thus limiting the model’s ability
to model normal samples and consequently restricting the
effectiveness of anomaly detection. For reconstruction-based
anomaly detection methods, multi-modal hard samples may
exhibit large reconstruction errors in both modalities or in one
single modality. For the latter case, fine-grained (modality-
level rather than sample-level) training optimization adjust-
ments are required to balance the fitting degrees of both
modalities, achieving unified modeling of normal samples and
obtaining better anomaly detection performance.
To address the two challenges of hard samples mentioned
above, we propose a novel unsupervised adversarial contrastive
learning-based method for general log-metric multi-modal data
anomaly detection (UAC-AD). Specifically, we propose an
adversarial learning framework with contrastive learning. We
utilize contrastive learning is to learn the complex combina-
tions of normal samples by enlarging the distance between
normal and anomaly samples so that the model can better
distinguish between hard and abnormal samples, reduce false
positives, and improve the performance of anomaly detection.
Some studies [6] have confirmed that time-aligned multi-
modal data reflects the operational state of the system, while
unaligned data reflects inconsistent system states. Therefore,
we consider the time-aligned combination of logs and metrics
as positive (i.e., normal) samples and treat the unaligned
combination as negative (i.e., abnormal) samples. Then we
increase the distance between positive and negative samples,
alleviating the first challenge of hard samples.
Moreover, in our adversarial training phase, the discrimina-
tor can judge the reconstruction performance of the generator
and increase the weight of reconstruction learning for the
parts with poor reconstruction performance (i.e., hard sam-
ples for the reconstruction-based anomaly detection methods).
We furthermore design separate discriminators against each
modality part of the multi-modal data to achieve a modality-
level training optimization adjustment for hard samples.
Our contributions are as follows:
• We clarify the problem of inconsistent convergence speed
of hard samples for multi-modal anomaly detection in
microservice systems, and we design a fine-grained ad-
versarial learning framework to adjust the convergence
speed of each modality part of multi-modal hard samples.
Extensive ablation studies verify the effectiveness of the
proposed adversarial learning strategy.
• Within the adversarial learning framework, we introduce
contrastive learning to enhance the model’s understanding
of complex combinations of the multi-modal hard sam-
ples. This widens the gap between hard and anomalous
samples in the decision space, thereby achieving better
anomaly detection performance. Moreover, we release
our code and related data sets for better replication and
future research [7].
II. RELATED WORK
Recently, tremendous efforts have been devoted to anomaly
detection to ensure the reliability of large-scale systems.
The anomaly detection methods are usually based on logs,
metrics, or both. In this section, we first review the anomaly
detection works, including metric-based, logging-based, and
multi-modal-based methods, which are closely related to our
work. We then introduce the key techniques we used in this
paper, including adversarial learning and contrastive learning.
A. Metric-based Methods
Metric data is a typical time series data collected from
the monitors to monitor the running state of the instances
at the application or system level. According to the criterion
for anomaly determination, the paradigms of metric-based
anomaly detection can be categorized into three types. Density
estimation-based methods [8]–[11] assumed that the normal
data conforms to a specific probability distribution and identi-
fied anomalies according to the probability density of the data
points or the likelihood of the data points appearing. Zong et
al. [10] and Yairi et al. [11] introduced the Gaussian mixture
models into their framework, facilitating the estimation of
representation densities. Clustering-based methods [12]–[14]
assumed that the outlier data is the anomaly and identified the
anomalies according to the distance from the data point to the
cluster center. Tax et al. [12] and Ruff et al. [13] constructed
This article has been accepted for publication in IEEE Transactions on Services Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TSC.2024.3411481
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ZTE CORPORATION. Downloaded on November 26,2024 at 05:49:54 UTC from IEEE Xplore. Restrictions apply.
文档被以下合辑收录
相关文档
评论