中国物理B ›› 2023, Vol. 32 ›› Issue (10): 108702-108702.doi: 10.1088/1674-1056/acf03e

• • 上一篇    下一篇

Combination of density-clustering and supervised classification for event identification in single-molecule force spectroscopy data

Yongyi Yuan(袁泳怡)1,2, Jialun Liang(梁嘉伦)1,2,†, Chuang Tan(谭创)1,2, Xueying Yang(杨雪滢)1,2, Dongni Yang(杨东尼)1,2, and Jie Ma(马杰)1,2,‡   

  1. 1 School of Physics, Sun Yat-sen University, Guangzhou 510275, China;
    2 State Key Laboratory of Optoelectronic Materials and Technologies, Sun Yat-sen University, Guangzhou 510006, China
  • 收稿日期:2023-07-01 修回日期:2023-07-28 接受日期:2023-08-15 出版日期:2023-09-21 发布日期:2023-10-08
  • 通讯作者: Jialun Liang, Jie Ma E-mail:liangjlun3@mail.sysu.edu.cn;majie6@mail.sysu.edu.cn
  • 基金资助:
    Project supported by the National Natural Science Foundation of China (Grant No. 12074445) and the Open Fund of the State Key Laboratory of Optoelectronic Materials and Technologies of Sun Yat-sen University (Grant No. OEMT-2022-ZTS-05).

Combination of density-clustering and supervised classification for event identification in single-molecule force spectroscopy data

Yongyi Yuan(袁泳怡)1,2, Jialun Liang(梁嘉伦)1,2,†, Chuang Tan(谭创)1,2, Xueying Yang(杨雪滢)1,2, Dongni Yang(杨东尼)1,2, and Jie Ma(马杰)1,2,‡   

  1. 1 School of Physics, Sun Yat-sen University, Guangzhou 510275, China;
    2 State Key Laboratory of Optoelectronic Materials and Technologies, Sun Yat-sen University, Guangzhou 510006, China
  • Received:2023-07-01 Revised:2023-07-28 Accepted:2023-08-15 Online:2023-09-21 Published:2023-10-08
  • Contact: Jialun Liang, Jie Ma E-mail:liangjlun3@mail.sysu.edu.cn;majie6@mail.sysu.edu.cn
  • Supported by:
    Project supported by the National Natural Science Foundation of China (Grant No. 12074445) and the Open Fund of the State Key Laboratory of Optoelectronic Materials and Technologies of Sun Yat-sen University (Grant No. OEMT-2022-ZTS-05).

摘要: Single-molecule force spectroscopy (SMFS) measurements of the dynamics of biomolecules typically require identifying massive events and states from large data sets, such as extracting rupture forces from force-extension curves (FECs) in pulling experiments and identifying states from extension-time trajectories (ETTs) in force-clamp experiments. The former is often accomplished manually and hence is time-consuming and laborious while the latter is always impeded by the presence of baseline drift. In this study, we attempt to accurately and automatically identify the events and states from SMFS experiments with a machine learning approach, which combines clustering and classification for event identification of SMFS (ACCESS). As demonstrated by analysis of a series of data sets, ACCESS can extract the rupture forces from FECs containing multiple unfolding steps and classify the rupture forces into the corresponding conformational transitions. Moreover, ACCESS successfully identifies the unfolded and folded states even though the ETTs display severe nonmonotonic baseline drift. Besides, ACCESS is straightforward in use as it requires only three easy-to-interpret parameters. As such, we anticipate that ACCESS will be a useful, easy-to-implement and high-performance tool for event and state identification across a range of single-molecule experiments.

关键词: single-molecule force spectroscopy, data analysis, density-based clustering, supervised classification

Abstract: Single-molecule force spectroscopy (SMFS) measurements of the dynamics of biomolecules typically require identifying massive events and states from large data sets, such as extracting rupture forces from force-extension curves (FECs) in pulling experiments and identifying states from extension-time trajectories (ETTs) in force-clamp experiments. The former is often accomplished manually and hence is time-consuming and laborious while the latter is always impeded by the presence of baseline drift. In this study, we attempt to accurately and automatically identify the events and states from SMFS experiments with a machine learning approach, which combines clustering and classification for event identification of SMFS (ACCESS). As demonstrated by analysis of a series of data sets, ACCESS can extract the rupture forces from FECs containing multiple unfolding steps and classify the rupture forces into the corresponding conformational transitions. Moreover, ACCESS successfully identifies the unfolded and folded states even though the ETTs display severe nonmonotonic baseline drift. Besides, ACCESS is straightforward in use as it requires only three easy-to-interpret parameters. As such, we anticipate that ACCESS will be a useful, easy-to-implement and high-performance tool for event and state identification across a range of single-molecule experiments.

Key words: single-molecule force spectroscopy, data analysis, density-based clustering, supervised classification

中图分类号:  (Single-molecule techniques)

  • 87.80.Nj
82.37.-j (Single molecule kinetics) 87.15.H- (Dynamics of biomolecules) 07.05.Kf (Data analysis: algorithms and implementation; data management)