中国物理B ›› 2022, Vol. 31 ›› Issue (6): 68902-068902.doi: 10.1088/1674-1056/ac4483

所属专题: SPECIAL TOPIC — Interdisciplinary physics: Complex network dynamics and emerging technologies

• • 上一篇    下一篇

A novel similarity measure for mining missing links in long-path networks

Yijun Ran(冉义军)1, Tianyu Liu(刘天宇)1, Tao Jia(贾韬)1,†, and Xiao-Ke Xu(许小可)2,‡   

  1. 1 College of Computer and Information Science, Southwest University, Chongqing 400715, China;
    2 College of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
  • 收稿日期:2021-08-04 修回日期:2021-12-01 接受日期:2021-12-21 出版日期:2022-05-17 发布日期:2022-06-07
  • 通讯作者: Tao Jia, Xiao-Ke Xu E-mail:tjia@swu.edu.cn;xuxiaoke@foxmail.com
  • 基金资助:
    Project supported by the National Natural Science Foundation of China (Grant Nos. 61773091 and 62173065), the Industry-University-Research Innovation Fund for Chinese Universities (Grant No. 2021ALA03016), the Fund for University Innovation Research Group of Chongqing (Grant No. CXQT21005), the National Social Science Foundation of China (Grant No. 20CTQ029), and the Fundamental Research Funds for the Central Universities (Grant No. SWU119062).

A novel similarity measure for mining missing links in long-path networks

Yijun Ran(冉义军)1, Tianyu Liu(刘天宇)1, Tao Jia(贾韬)1,†, and Xiao-Ke Xu(许小可)2,‡   

  1. 1 College of Computer and Information Science, Southwest University, Chongqing 400715, China;
    2 College of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
  • Received:2021-08-04 Revised:2021-12-01 Accepted:2021-12-21 Online:2022-05-17 Published:2022-06-07
  • Contact: Tao Jia, Xiao-Ke Xu E-mail:tjia@swu.edu.cn;xuxiaoke@foxmail.com
  • Supported by:
    Project supported by the National Natural Science Foundation of China (Grant Nos. 61773091 and 62173065), the Industry-University-Research Innovation Fund for Chinese Universities (Grant No. 2021ALA03016), the Fund for University Innovation Research Group of Chongqing (Grant No. CXQT21005), the National Social Science Foundation of China (Grant No. 20CTQ029), and the Fundamental Research Funds for the Central Universities (Grant No. SWU119062).

摘要: Network information mining is the study of the network topology, which may answer a large number of application-based questions towards the structural evolution and the function of a real system. The question can be related to how the real system evolves or how individuals interact with each other in social networks. Although the evolution of the real system may seem to be found regularly, capturing patterns on the whole process of evolution is not trivial. Link prediction is one of the most important technologies in network information mining, which can help us understand the evolution mechanism of real-life network. Link prediction aims to uncover missing links or quantify the likelihood of the emergence of nonexistent links from known network structures. Currently, widely existing methods of link prediction almost focus on short-path networks that usually have a myriad of close triangular structures. However, these algorithms on highly sparse or long-path networks have poor performance. Here, we propose a new index that is associated with the principles of structural equivalence and shortest path length (SESPL) to estimate the likelihood of link existence in long-path networks. Through a test of 548 real networks, we find that SESPL is more effective and efficient than other similarity-based predictors in long-path networks. Meanwhile, we also exploit the performance of SESPL predictor and of embedding-based approaches via machine learning techniques. The results show that the performance of SESPL can achieve a gain of 44.09% over GraphWave and 7.93% over Node2vec. Finally, according to the matrix of maximal information coefficient (MIC) between all the similarity-based predictors, SESPL is a new independent feature in the space of traditional similarity features.

关键词: structural equivalence, shortest path length, long-path networks, missing links

Abstract: Network information mining is the study of the network topology, which may answer a large number of application-based questions towards the structural evolution and the function of a real system. The question can be related to how the real system evolves or how individuals interact with each other in social networks. Although the evolution of the real system may seem to be found regularly, capturing patterns on the whole process of evolution is not trivial. Link prediction is one of the most important technologies in network information mining, which can help us understand the evolution mechanism of real-life network. Link prediction aims to uncover missing links or quantify the likelihood of the emergence of nonexistent links from known network structures. Currently, widely existing methods of link prediction almost focus on short-path networks that usually have a myriad of close triangular structures. However, these algorithms on highly sparse or long-path networks have poor performance. Here, we propose a new index that is associated with the principles of structural equivalence and shortest path length (SESPL) to estimate the likelihood of link existence in long-path networks. Through a test of 548 real networks, we find that SESPL is more effective and efficient than other similarity-based predictors in long-path networks. Meanwhile, we also exploit the performance of SESPL predictor and of embedding-based approaches via machine learning techniques. The results show that the performance of SESPL can achieve a gain of 44.09% over GraphWave and 7.93% over Node2vec. Finally, according to the matrix of maximal information coefficient (MIC) between all the similarity-based predictors, SESPL is a new independent feature in the space of traditional similarity features.

Key words: structural equivalence, shortest path length, long-path networks, missing links

中图分类号:  (Networks and genealogical trees)

  • 89.75.Hc
89.65.-s (Social and economic systems) 89.20.Ff (Computer science and technology)