SIGMOD2023_Hereditary Cohesive Subgraphs Enumeration on Bipartite Graphs The Power of Pivot-based Approaches_腾讯云.pdf

迹部景吾

26页

0次

2025-04-21

免费下载

138

Hereditary Cohesive Subgraphs Enumeration on Bipartite

Graphs: The Power of Pivot-based Approaches

QIANGQIANG DAI, RONG-HUA LI, XIAOWEI YE, and MEIHAO LIAO, Beijing Institute of

Technology, China

WEIPENG ZHANG, Tencent Technology (Shenzhen) Company Limited, China

GUOREN WANG, Beijing Institute of Technology, China

Finding cohesive subgraphs from a bipartite graph is a fundamental operator in bipartite graph analysis.

In this paper, we focus on the problem of mining cohesive subgraphs from a bipartite graph that satisfy a

hereditary property. Here a cohesive subgraph meets the hereditary property if all of its subgraphs satisfy the

same property as itself. We show that several important cohesive subgraph models, such as maximal biclique

and maximal

𝑘

-biplex, satisfy the hereditary property. The problem of enumerating all maximal hereditary

subgraphs was known to be NP-hard. To solve this problem, we rst propose a novel and general pivot-based

enumeration framework to eciently enumerate all maximal hereditary subgraphs in a bipartite graph. Then,

based on our general framework, we develop a new pivot-based algorithm with several pruning techniques to

enumerate all maximal bicliques. We prove that the worst-case time complexity of our pivot-based maximal

biclique enumeration algorithm is

𝑂 (𝑚 ×

𝑛/2

)

(or

𝑂 (𝑚 ×

414

𝑛

)

) which is near optimal since there exist

up to

𝑂 (

𝑛/2

)

maximal bicliques in a bipartite graph with

𝑛

vertices and

𝑚

edges. Moreover, we also show

that our algorithm can achieve polynomial-delay time complexity with a slight modication. Third, on the

basis of our general framework, we also devise a novel pivot-based algorithm with several non-trivial pruning

techniques to enumerate maximal

𝑘

-biplexes in a bipartite graph. Finally, we conduct extensive experiments

using 11 real-world bipartite graphs to evaluate the proposed algorithms. The results show that our pivot-based

solutions can achieve one order of magnitude (three orders of magnitude) faster than the state-of-the-art

maximal biclique enumeration algorithms (maximal 𝑘-biplex enumeration algorithms).

CCS Concepts: • Theory of computation → Backtracking.

Additional Key Words and Phrases: hereditary cohesive subgraph, biclique,

𝑘

-biplex, enumeration framework

ACM Reference Format:

Qiangqiang Dai, Rong-Hua Li, Xiaowei Ye, Meihao Liao, Weipeng Zhang, and Guoren Wang. 2023. Hereditary

Cohesive Subgraphs Enumeration on Bipartite Graphs: The Power of Pivot-based Approaches. Proc. ACM

Manag. Data 1, 2, Article 138 (June 2023), 26 pages. https://doi.org/10.1145/3589283

1 INTRODUCTION

Bipartite graphs are ubiquitous in real-world applications. In a bipartite graph, the vertices can be

divided into two disjoint sets and each edge connects a vertex in one set to a vertex in the other set.

Some representative examples of real-world bipartite graphs include user-item networks [

author-publication networks [

], and biological networks [

]. Real-world bipartite graphs often

Authors’ addresses: Qiangqiang Dai, qiangd66@gmail.com; Rong-Hua Li, lironghuabit@126.com; Xiaowei Ye, yexiaowei@

bit.edu.cn; Meihao Liao, mhliao@bit.edu.cn, Beijing Institute of Technology, Beijing, China; Weipeng Zhang, Tencent

Technology (Shenzhen) Company Limited, Shenzhen, China, jackwpzhang@tencent.com; Guoren Wang, Beijing Institute of

Technology, Beijing, China, wanggrbit@126.com.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee

provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the

full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored.

Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires

prior specic permission and/or a fee. Request permissions from permissions@acm.org.

2836-6573/2023/6-ART138 $15.00

https://doi.org/10.1145/3589283

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 138. Publication date: June 2023.

138:2 Qiangqiang Dai et al.

contain cohesive subgraph structures which correspond to communities or densely-connected

groups. Mining cohesive subgraphs from a bipartite graph is a fundamental operator in bipartite

graph analysis which has been widely used in many applications, such as community detection

[20, 22], online recommendation [17, 36], and biological network analysis [6, 42].

There exist many cohesive subgraph models in bipartite graphs. Notable examples include

maximal biclique [

], maximal

𝑘

-biplex [

(𝛼, 𝛽)

-core [

𝑘

-bitruss

[45, 52], and quasi biclique [19, 30, 46]. Among them, the maximal biclique model, perhaps, is the

most fundamental model, as all the other cohesive subgraph models can be considered as a relaxed

biclique model.

Instead of focusing on a particular cohesive subgraph model, in this paper, we study a family of

cohesive subgraph models in bipartite graphs that meet the hereditary property, namely hereditary

cohesive subgraphs. Here a subgraph

𝐺

′

is called a hereditary subgraph if (1)

𝐺

′

satises a property

and (2) every induced subgraph of

𝐺

′

also meets the property

. For convenience, we refer to

a subgraph that meets a hereditary property

as a

-subgraph. A

-subgraph

𝐺

′

is a maximal

-subgraph if there is no other

-subgraph containing

𝐺

′

. Given a bipartite graph

𝐺

, our goal is

to enumerate all maximal

-subgraphs from

𝐺

. To our knowledge, such a maximal

-subgraph

enumeration problem has not been investigated before. We show that both the maximal biclique and

maximal

𝑘

-biplex are maximal

-subgraphs, thus both the classic maximal biclique enumeration

and maximal 𝑘-biplex enumeration problems are special instances of our problem.

Motivations. Practical solutions for maximal

-subgraph enumeration can be applied to many

applications, and two of them are summarized as follows.

Community detection in bipartite graphs.

Detecting communities in a bipartite graph is an impor-

tant graph analysis task [

]. We can use the hereditary cohesive subgraph to model

communities in a bipartite graph. The communities detected by the hereditary cohesive subgraph

model often exhibit strong robustness, since the removal of any subset of vertices from the commu-

nity does not destroy the structural property. In eect, the classic maximal biclique and maximal

𝑘

-biplex models have been widely used for community detection applications [

] due to

such a nice hereditary property and cohesive property. Thus, the solution for maximal

-subgraph

enumeration can provide a general framework for community detection in bipartite graphs, which

captures a family of dierent community models.

Fraud detection in user-item networks.

Consider an online user-item rating network (e.g., Amazon’s

user-product rating network), where the users can give ratings to the items. The item owners may

wish to improve their items’ ratings by hiring some fake users to frequently give high ratings to

their items. Clearly, the set of fake users and the set of their rated items often form a densely-

connected subgraph. Once again, we can use the hereditary cohesive subgraph model, such as

maximal biclique or maximal

𝑘

-biplex to detect such fake users [

]. As a result, the approaches to

maximal

-subgraph enumeration can also be used for identifying possible rating frauds in online

user-item networks.

Although the signicance of the maximal

-subgraph enumeration problem, a practical solution

for this problem is still lacking due to the intrinsic challenges of this problem. First, as indicated in

[

], the problem of enumerating all maximal

-subgraphs from a bipartite graph is NP-hard. Thus,

there does not exist a polynomial algorithm to solve this problem unless

NP = P

. Second, since the

hereditary property

is arbitrary (the enumeration algorithm for our problem should work for

any hereditary property

) and internal structure of the

is unclear, it is quite non-trivial to use

such a property

to design an enumeration algorithm. Moreover, existing solutions for maximal

biclique enumeration and maximal

𝑘

-biplex enumeration also cannot be generalized to handle

our problem. This is because dierent properties

give rise to dierent enumeration problems,

Proc. ACM Manag. Data, Vol. 1, No. 2, Article 138. Publication date: June 2023.

of 26

免费下载

关注

评论