its communities are the nearby densely connected nodes that
share similar attributes with the query node. Such prior is
shared by different CS tasks for different query nodes in any
real-world graphs. And the prior knowledge is capable of
synthesizing similar or complementary inductive bias across
different CS tasks to compensate the insufficient knowledge
from small training data, thus can be swiftly adapted to a new
task to test. In this paper, we concentrate ourselves on learning
a meta model to capture this prior by meta-learning.
There are existing meta-learning algorithms, e.g., simple
feature transfer and model-agnostic meta-learning. However,
trivial adaptations to CS tasks fail to achieve high performance
since they do not exploit the intrinsic characteristic of the
CS tasks. For CS, what a model needs to justify for each
node in a graph is whether or not it has its community
membership with any given query node. To facilitate such
binary justification, we propose a novel model, Conditional
Graph Neural Process (CGNP), to generate node embeddings
conditioned on the small training data, where the distance
between a node embedding to that of the query node explicitly
indicates their community membership. Furthermore, as a
graph specification of Conditional Neural Process (CNP) [14],
CGNP inherits the main ideas of CNP that implicitly learns
a kernel function between a training query node and a query
node to be predicted. In a nutshell, the learned CGNP is
not only a common embedding function but also a common
kernel function, shared across different graphs. The embedding
function transforms the nodes of each graph into a distance-
aware hidden space, while the kernel function memorizes the
small training data of each task as a hidden representation.
Compared with optimization-based meta-learning approaches
whose parameters are easy to overfit, the metric learning and
memorization mechanisms are more suitable for classification
tasks with small training data, especially for imbalanced labels.
The contributions of this paper are summarized as follows:
① We formulate the problem of learning a meta model to
answer CS queries, where the meta model is to absorb the
prior knowledge from multiple CS tasks and adapt to a specific
task with only a few training data. We generalize three CS task
scenarios that represent comprehensive query cases. To the
best of our knowledge, our study is the first attempt at meta
model/algorithm for CS. ② We explore three Graph Neural
Network based solutions, i.e., feature transfer, model-agnostic
meta-learning and Graph Prototypical Network, which are triv-
ial adaptations of existing transfer/meta-learning algorithms to
CS. We identify their individual limitations regarding predic-
tion effectiveness, generalization capability and efficiency. ③
We propose a novel framework, Conditional Graph Neural
Process (CGNP) on the basis of conceptual CNP and learn the
meta model in an efficient, metric-based learning perspective.
We design and explore model variants with different model
complexities and different options for the core components. To
the best of our knowledge, we made the first effort to explore
how to solve CS problem by meta-learning. ④ We conduct
extensive experiments on 6 real-world datasets with ground-
truth communities for performance evaluation. Compared with
3 CS algorithms, 4 naive approaches, and 3 supervised learning
validates our CGNP outperforms the others with small training
and prediction cost.
Roadmap: The rest of the paper is organized as follows.
section II reviews the relative work. In section III, we give
the problem statement followed by three naive solutions in-
troduced in section IV. We introduce the core idea of our
approach, CGNP in section V. We elaborate on its architecture
design and present the learning algorithms of CGNP in sec-
tion VI. We present our comprehensive experimental studies
in section VII and conclude the paper in section VIII.
II. RELATED WORK
Community Search. A comprehensive survey of CS problems
and approaches can be found in [1], [2]. In a nutshell,
CS problem can be divided into two categories. One is
non-attributed community search which only concerns the
structural cohesiveness over simple graphs and the other is
attributed community search (ACS) which concerns both the
structural cohesiveness and content overlapping or similarities
over attributed graphs. Regarding capturing the structural
cohesiveness, various community metrics have been proposed,
including k-core [3]–[5], k-truss [6], [7], k-clique [8], [9]
and k-edge connected component [10], [11]. These metrics
are inflexible to adapt to complex real-world graphs and
applications.
In addition to only exploiting the structural information,
ACS leverages both the structural constraint and attributes such
as keywords [15], [16], location [17], temporal [18], etc. As
two representative approaches for ACS, ATC [16] finds k-truss
community with the maximum pre-defined attribute score. And
ACQ [15] finds k-core communities whose nodes share the
maximum attributes with the query attributes. Both ATC and
ACQ adopt a two-stage process. First, they find the candidate
communities based on the structural constraints. Then, the
candidates are verified based on the computed attribute score
or the appearance of attribute set. However, the quality of the
found communities of the two approaches are unpromising
since the independent two stages fail to capture the correlations
between structures and attributes in a joint fashion.
With the development of ML/DL, recently, GNN has been
adopted for CS [12]. By recasting the community membership
determination to a classification task, a model can learn via
its prediction error feedback given the training samples and
can adapt to a specific graph in an end-to-end way. Recently,
Gao et al. proposed ICS-GNN [12] for interactive CS, which
allows users to provide ground-truth for online incremental
learning. The model is a query-specific model that fails to
generalize to new query nodes. [13] proposes a graph neural
network based model that is trained by a collection of query
nodes with their ground-truth, and makes predictions for
unseen query nodes in a single graph.
ML/DL for Graph Analytics. Apart from CS, ML/DL
techniques are widely used in various graph analytical tasks,
including classical combinatorial optimization problems [19],
评论