中国物理B ›› 2025, Vol. 34 ›› Issue (10): 108903-108903.doi: 10.1088/1674-1056/addaa2

• • 上一篇    下一篇

Extracting fuzzy clusters from massive attributed graphs using Markov lumpability optimization

Kai-Yue Jiang(蒋凯悦)1,†, Li-Heng Xu(徐力恒)1,†, Shi-Pei Lin(林诗佩)1,†, Li-Yang Zhou(周李阳)1,†, Hui-Jia Li(李慧嘉)1,‡, and Ge Gao(高歌)2,§   

  1. 1 School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300074, China;
    2 Management School, Beijing Sport University, Beijing 100084, China
  • 收稿日期:2025-03-07 修回日期:2025-05-11 接受日期:2025-05-20 发布日期:2025-09-29
  • 通讯作者: Hui-Jia Li, Ge Gao E-mail:hjli@nankai.edu.cn;gaoge@bsu.edu.cn
  • 基金资助:
    This work is supported by the National Natural Science Foundation of China (Grant No. 72571150) and Beijing Natural Science Foundation (Grant No. 9182015).

Extracting fuzzy clusters from massive attributed graphs using Markov lumpability optimization

Kai-Yue Jiang(蒋凯悦)1,†, Li-Heng Xu(徐力恒)1,†, Shi-Pei Lin(林诗佩)1,†, Li-Yang Zhou(周李阳)1,†, Hui-Jia Li(李慧嘉)1,‡, and Ge Gao(高歌)2,§   

  1. 1 School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300074, China;
    2 Management School, Beijing Sport University, Beijing 100084, China
  • Received:2025-03-07 Revised:2025-05-11 Accepted:2025-05-20 Published:2025-09-29
  • Contact: Hui-Jia Li, Ge Gao E-mail:hjli@nankai.edu.cn;gaoge@bsu.edu.cn
  • Supported by:
    This work is supported by the National Natural Science Foundation of China (Grant No. 72571150) and Beijing Natural Science Foundation (Grant No. 9182015).

摘要: Attributed graph clustering plays a vital role in uncovering hidden network structures, but it presents significant challenges. In recent years, various models have been proposed to identify meaningful clusters by integrating both structural and attribute-based information. However, these models often emphasize node proximities without adequately balancing the efficiency of clustering based on both structural and attribute data. Furthermore, they tend to neglect the critical fuzzy information inherent in attributed graph clusters. To address these issues, we introduce a new framework, Markov lumpability optimization, for efficient clustering of large-scale attributed graphs. Specifically, we define a lumped Markov chain on an attribute-augmented graph and introduce a new metric, Markov lumpability, to quantify the differences between the original and lumped Markov transition probability matrices. To minimize this measure, we propose a conjugate gradient projection-based approach that ensures the partitioning closely aligns with the intrinsic structure of fuzzy clusters through conditional optimization. Extensive experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed framework compared to existing clustering algorithms. This framework has many potential applications, including dynamic community analysis of social networks, user profiling in recommendation systems, functional module identification in biological molecular networks, and financial risk control, offering a new paradigm for mining complex patterns in high-dimensional attributed graph data.

关键词: attributed clustering, Markov chain, lumped random walk, fuzzy clusters, optimization

Abstract: Attributed graph clustering plays a vital role in uncovering hidden network structures, but it presents significant challenges. In recent years, various models have been proposed to identify meaningful clusters by integrating both structural and attribute-based information. However, these models often emphasize node proximities without adequately balancing the efficiency of clustering based on both structural and attribute data. Furthermore, they tend to neglect the critical fuzzy information inherent in attributed graph clusters. To address these issues, we introduce a new framework, Markov lumpability optimization, for efficient clustering of large-scale attributed graphs. Specifically, we define a lumped Markov chain on an attribute-augmented graph and introduce a new metric, Markov lumpability, to quantify the differences between the original and lumped Markov transition probability matrices. To minimize this measure, we propose a conjugate gradient projection-based approach that ensures the partitioning closely aligns with the intrinsic structure of fuzzy clusters through conditional optimization. Extensive experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed framework compared to existing clustering algorithms. This framework has many potential applications, including dynamic community analysis of social networks, user profiling in recommendation systems, functional module identification in biological molecular networks, and financial risk control, offering a new paradigm for mining complex patterns in high-dimensional attributed graph data.

Key words: attributed clustering, Markov chain, lumped random walk, fuzzy clusters, optimization

中图分类号:  (Systems obeying scaling laws)

  • 89.75.Da
02.50.Ey (Stochastic processes) 07.05.Tp (Computer modeling and simulation) 05.40.-a (Fluctuation phenomena, random processes, noise, and Brownian motion)