Accurate prediction of essential proteins using ensemble machine learning

doi:10.1088/1674-1056/ad8db2

Abstract Essential proteins are crucial for biological processes and can be identified through both experimental and computational methods. While experimental approaches are highly accurate, they often demand extensive time and resources. To address these challenges, we present a computational ensemble learning framework designed to identify essential proteins more efficiently. Our method begins by using node2vec to transform proteins in the protein-protein interaction (PPI) network into continuous, low-dimensional vectors. We also extract a range of features from protein sequences, including graph-theory-based, information-based, compositional, and physiochemical attributes. Additionally, we leverage deep learning techniques to analyze high-dimensional position-specific scoring matrices (PSSMs) and capture evolutionary information. We then combine these features for classification using various machine learning algorithms. To enhance performance, we integrate the outputs of these algorithms through ensemble methods such as voting, weighted averaging, and stacking. This approach effectively addresses data imbalances and improves both robustness and accuracy. Our ensemble learning framework achieves an AUC of 0.960 and an accuracy of 0.9252, outperforming other computational methods. These results demonstrate the effectiveness of our approach in accurately identifying essential proteins and highlight its superior feature extraction capabilities.

Keywords: protein-protein interaction (PPI) essential proteins deep learning ensemble learning

Received: 15 September 2024
Revised: 21 October 2024
Accepted manuscript online: 01 November 2024

PACS:

89.75.-k

(Complex systems)

Fund: This work was financially supported by the National Key R&D Program of China (Grant No. 2022YFF1202600),the National Natural Science Foundation of China (Grant No. 82301158), Science and Technology Innovation Action Plan of Shanghai Science and Technology Committee (Grant No. 22015820100), Two-hundred Talent Support (Grant No. 20152224), Translational Medicine Innovation Project of Shanghai Jiao Tong University School of Medicine (Grant No. TM201915), Clinical Research Project of Multi-Disciplinary Team, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine (Grant No. 201914), and China Postdoctoral Science Foundation (Grant No. 2023M742332).

Corresponding Authors: Yuanyuan Liu, Jinwu Wang
E-mail: yuanyuan_liu@shu.edu.cn;wangjw@shsmu.edu.cn

Cite this article:

Dezhi Lu(鲁德志), Hao Wu(吴淏), Yutong Hou(侯俞彤), Yuncheng Wu(吴云成), Yuanyuan Liu(刘媛媛), and Jinwu Wang(王金武) Accurate prediction of essential proteins using ensemble machine learning 2025 Chin. Phys. B 34 018901

[1] Kovács I A, Luck K, Spirohn K, Wang Y, Pollis C, Schlabach S, Bian W, Kim D K, Kishore N, Hao T, Calderwood M A, Vidal M and Barabási A L 2019 Nat. Commun. 10 1240
[2] Sengupta K, Saha S, Halder A K, Chatterjee P, Nasipuri M, Basu S and Plewczynski D 2022 Frontiers in Genetics 13 969915
[3] Saha S, Chatterjee P, Nasipuri M and Basu S 2021 PeerJ 9 e12117
[4] Saha S, Halder A K, Bandyopadhyay S S, Chatterjee P, NasipuriMand Basu S 2022 Methods (San Diego, Calif.) 203 488
[5] Zhang X, Acencio M L and Lemke N 2016 Frontiers in Physiology 7 75
[6] Ao C, Zhou W, Gao L, Dong B and Yu L 2020 Genomics 112 4666
[7] Acencio M L and Lemke N 2009 BMC Bioinformatics. 10 290
[8] Wang N, Zeng M, Li Y, Wu F X and Li M 2021 Journal of Computational Biology 28 687
[9] Wu C Y, Lin B T, Shi K, Zhang Q J, Gao R, Yu Z, De Marinis Y, Liu Z P and Zhang Y 2021 Current Bioinformatics 16 1161
[10] Zhong J, Wang J, Peng W, Zhang Z and Li M 2015 Tsinghua Science and Technology 20 491
[11] Lu P L, Yang P S and Liao Y G 2023 Journal of Shanghai Jiaotong University (Science) 28 1
[12] Schapke J, Tavares A and Recamonde-Mendoza M 2021 IEEE/ACM Transactions on Computational Biology and Bioinformatics 19 1615
[13] Li Y M, Zeng M,Wu Y F, Li Y and LiM2022 IEEE/ACM Transactions on Computational Biology and Bioinformatics 19 3263
[14] Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, Li Y and Jiang H 2007 Proc. Natl. Acad. Sci. USA 104 4337
[15] Zeng M, Zhang F H, Wu F X, Li Y H, Wang J X and Li M 2019 Bioinformatics 36 1114
[16] Lu P L, Zhong Y and Yang P S. 2023 Chin. Phys. B 33 018903
[17] Grover A and Leskovec J 2016 KDD: Proceedings. International Conference on Knowledge Discovery & Data Mining 2016 855
[18] Eraslan G, Avsec Z, Gagneur J and Theis F J 2019 Nat. Rev. Gene. 20 389
[19] Zhang F, Song H, Zeng M, Wu F X, Li Y, Pan Y and Li M 2021 IEEE/ACM Transactions on Computational Biology and Bioinformatics 18 2208
[20] Zeng M, Li M, Wu F X, Li Y and Pan Y 2019 BMC Bioinformatics 20 506
[21] Wang S F, LiWJ, Fei Y, Cao Z C, Xu D and Guo H 2019 IEEE Access 7 42384
[22] Lu P, Chen Y, Zhang T and Liao Y 2022 Chin. Phys. B 31 118901
[23] Yang P, Lu P and Zhang T 2023 Chin. Phys. B 32 058902
[24] Er M J, Zhang Y, Wang N and Pratama M 2016 Information Sciences 373 388
[25] Lv Q, Chen G, He H, Yang Z, Zhao L, Chen H Y and Chen C Y 2023 Chemical Science 14 10684
[26] Wu X H, Tao R, Sun Z Y, Zhang T Y, Li X Y, Yuan Y, Zheng S W, Cao C, Zhang Z H, Zhao X Y and Yang P 2024 Spectrochimica acta. Part A, Molecular and Biomolecular Spectroscopy 316 124351
[27] Geurts P, Ernst D and Wehenkel L 2006 Machine Learning 63 03
[28] Jerome H F 2001 The Annals of Statistics 29 1189
[29] Blomen V A, Májek P, Jae L T, et al. 2015 Science 350 1092

[1]	A large language model-powered literature review for high-angle annular dark field imaging Wenhao Yuan(袁文浩), Cheng Peng(彭程), and Qian He(何迁). Chin. Phys. B, 2024, 33(9): 098703.
[2]	High-quality ghost imaging based on undersampled natural-order Hadamard source Kang Liu(刘炕), Cheng Zhou(周成), Jipeng Huang(黄继鹏), Hongwu Qin(秦宏伍), Xuan Liu(刘轩), Xinwei Li(李鑫伟), and Lijun Song(宋立军). Chin. Phys. B, 2024, 33(9): 094204.
[3]	Properties of radiation defects and threshold energy of displacement in zirconium hydride obtained by new deep-learning potential Xi Wang(王玺), Meng Tang(唐孟), Ming-Xuan Jiang(蒋明璇), Yang-Chun Chen(陈阳春), Zhi-Xiao Liu(刘智骁), and Hui-Qiu Deng(邓辉球). Chin. Phys. B, 2024, 33(7): 076103.
[4]	Image segmentation of exfoliated two-dimensional materials by generative adversarial network-based data augmentation Xiaoyu Cheng(程晓昱), Chenxue Xie(解晨雪), Yulun Liu(刘宇伦), Ruixue Bai(白瑞雪), Nanhai Xiao(肖南海), Yanbo Ren(任琰博), Xilin Zhang(张喜林), Hui Ma(马惠), and Chongyun Jiang(蒋崇云). Chin. Phys. B, 2024, 33(3): 030703.
[5]	Generation of orbital angular momentum hologram using a modified U-net Zhi-Gang Zheng(郑志刚), Fei-Fei Han(韩菲菲), Le Wang(王乐), and Sheng-Mei Zhao(赵生妹). Chin. Phys. B, 2024, 33(3): 034207.
[6]	Quantum state estimation based on deep learning Haowen Xiao(肖皓文) and Zhiguang Han(韩枝光). Chin. Phys. B, 2024, 33(12): 120307.
[7]	Essential proteins identification method based on four-order distances and subcellular localization information Pengli Lu(卢鹏丽), Yu Zhong(钟雨), and Peishi Yang(杨培实). Chin. Phys. B, 2024, 33(1): 018903.
[8]	A deep learning method based on prior knowledge with dual training for solving FPK equation Denghui Peng(彭登辉), Shenlong Wang(王神龙), and Yuanchen Huang(黄元辰). Chin. Phys. B, 2024, 33(1): 010202.
[9]	Crysformer: An attention-based graph neural network for properties prediction of crystals Tian Wang(王田), Jiahui Chen(陈家辉), Jing Teng(滕婧), Jingang Shi(史金钢),Xinhua Zeng(曾新华), and Hichem Snoussi. Chin. Phys. B, 2023, 32(9): 090703.
[10]	Classification and structural characteristics of amorphous materials based on interpretable deep learning Jiamei Cui(崔佳梅), Yunjie Li(李韵洁), Cai Zhao(赵偲), and Wen Zheng(郑文). Chin. Phys. B, 2023, 32(9): 096101.
[11]	Disruption prediction based on fusion feature extractor on J-TEXT Wei Zheng(郑玮), Fengming Xue(薛凤鸣), Zhongyong Chen(陈忠勇), Chengshuo Shen(沈呈硕), Xinkun Ai(艾鑫坤), Yu Zhong(钟昱), Nengchao Wang(王能超), Ming Zhang(张明),Yonghua Ding(丁永华), Zhipeng Chen(陈志鹏), Zhoujun Yang(杨州军), and Yuan Pan(潘垣). Chin. Phys. B, 2023, 32(7): 075203.
[12]	Modeling differential car-following behavior under normal and rainy conditions: A memory-based deep learning method with attention mechanism Hai-Jian Bai(柏海舰), Chen-Chen Guo(过晨晨), Heng Ding(丁恒), Li-Yang Wei(卫立阳), Ting Sun(孙婷), and Xing-Yu Chen(陈星宇). Chin. Phys. B, 2023, 32(6): 060507.
[13]	AG-GATCN: A novel method for predicting essential proteins Peishi Yang(杨培实), Pengli Lu(卢鹏丽), and Teng Zhang(张腾). Chin. Phys. B, 2023, 32(5): 058902.
[14]	Inatorial forecasting method considering macro and micro characteristics of chaotic traffic flow Yue Hou(侯越), Di Zhang(张迪), Da Li(李达), and Ping Yang(杨萍). Chin. Phys. B, 2023, 32(10): 100508.
[15]	Deep-learning-based cryptanalysis of two types of nonlinear optical cryptosystems Xiao-Gang Wang(汪小刚) and Hao-Yu Wei(魏浩宇). Chin. Phys. B, 2022, 31(9): 094202.

No Suggested Reading articles found!

Viewed

Full text

Abstract

Cited

Metrics
Related Articles

Accurate prediction of essential proteins using ensemble machine learning

Cite this article:

Online attention