中国物理B ›› 2020, Vol. 29 ›› Issue (8): 80201-080201.doi: 10.1088/1674-1056/ab8da6

所属专题: SPECIAL TOPIC — Machine learning in statistical physics

• SPECIAL TOPIC—Ultracold atom and its application in precision measurement •    下一篇

Inverse Ising techniques to infer underlying mechanisms from data

Hong-Li Zeng(曾红丽), Erik Aurell   

  1. 1 School of Science, New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;
    2 Nordita, Royal Institute of Technology, and Stockholm University, SE-10691 Stockholm, Sweden;
    3 KTH-Royal Institute of Technology, AlbaNova University Center, SE-10691 Stockholm, Sweden;
    4 Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, 30-348 Kraków, Poland
  • 收稿日期:2020-03-09 修回日期:2020-03-09 出版日期:2020-08-05 发布日期:2020-08-05
  • 通讯作者: Hong-Li Zeng, Erik Aurell E-mail:hlzeng@njupt.edu.cn;eaurell@kth.se
  • 基金资助:

    Project supported partially by the National Natural Science Foundation of China (Grant No. 11705097), the Natural Science Foundation of Jiangsu Province of China (Grant No. BK20170895), the Jiangsu Government Scholarship for Overseas Studies of 2018 and Scientific Research Foundation of Nanjing University of Posts and Telecommunications, China (Grant No. NY217013), and the Foundation for Polish Science through TEAM-NET Project (Grant No. POIR.04.04.00-00-17C1/18-00).

Inverse Ising techniques to infer underlying mechanisms from data

Hong-Li Zeng(曾红丽)1,2, Erik Aurell3,4   

  1. 1 School of Science, New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;
    2 Nordita, Royal Institute of Technology, and Stockholm University, SE-10691 Stockholm, Sweden;
    3 KTH-Royal Institute of Technology, AlbaNova University Center, SE-10691 Stockholm, Sweden;
    4 Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, 30-348 Kraków, Poland
  • Received:2020-03-09 Revised:2020-03-09 Online:2020-08-05 Published:2020-08-05
  • Contact: Hong-Li Zeng, Erik Aurell E-mail:hlzeng@njupt.edu.cn;eaurell@kth.se
  • Supported by:

    Project supported partially by the National Natural Science Foundation of China (Grant No. 11705097), the Natural Science Foundation of Jiangsu Province of China (Grant No. BK20170895), the Jiangsu Government Scholarship for Overseas Studies of 2018 and Scientific Research Foundation of Nanjing University of Posts and Telecommunications, China (Grant No. NY217013), and the Foundation for Polish Science through TEAM-NET Project (Grant No. POIR.04.04.00-00-17C1/18-00).

摘要:

As a problem in data science the inverse Ising (or Potts) problem is to infer the parameters of a Gibbs-Boltzmann distributions of an Ising (or Potts) model from samples drawn from that distribution. The algorithmic and computational interest stems from the fact that this inference task cannot be carried out efficiently by the maximum likelihood criterion, since the normalizing constant of the distribution (the partition function) cannot be calculated exactly and efficiently. The practical interest on the other hand flows from several outstanding applications, of which the most well known has been predicting spatial contacts in protein structures from tables of homologous protein sequences. Most applications to date have been to data that has been produced by a dynamical process which, as far as it is known, cannot be expected to satisfy detailed balance. There is therefore no a priori reason to expect the distribution to be of the Gibbs-Boltzmann type, and no a priori reason to expect that inverse Ising (or Potts) techniques should yield useful information. In this review we discuss two types of problems where progress nevertheless can be made. We find that depending on model parameters there are phases where, in fact, the distribution is close to Gibbs-Boltzmann distribution, a non-equilibrium nature of the under-lying dynamics notwithstanding. We also discuss the relation between inferred Ising model parameters and parameters of the underlying dynamics.

关键词: inverse Ising problem, kinetic Ising model, statistical genetics, fitness reconstruction

Abstract:

As a problem in data science the inverse Ising (or Potts) problem is to infer the parameters of a Gibbs-Boltzmann distributions of an Ising (or Potts) model from samples drawn from that distribution. The algorithmic and computational interest stems from the fact that this inference task cannot be carried out efficiently by the maximum likelihood criterion, since the normalizing constant of the distribution (the partition function) cannot be calculated exactly and efficiently. The practical interest on the other hand flows from several outstanding applications, of which the most well known has been predicting spatial contacts in protein structures from tables of homologous protein sequences. Most applications to date have been to data that has been produced by a dynamical process which, as far as it is known, cannot be expected to satisfy detailed balance. There is therefore no a priori reason to expect the distribution to be of the Gibbs-Boltzmann type, and no a priori reason to expect that inverse Ising (or Potts) techniques should yield useful information. In this review we discuss two types of problems where progress nevertheless can be made. We find that depending on model parameters there are phases where, in fact, the distribution is close to Gibbs-Boltzmann distribution, a non-equilibrium nature of the under-lying dynamics notwithstanding. We also discuss the relation between inferred Ising model parameters and parameters of the underlying dynamics.

Key words: inverse Ising problem, kinetic Ising model, statistical genetics, fitness reconstruction

中图分类号:  (Inference methods)

  • 02.50.Tt
05.40.-a (Fluctuation phenomena, random processes, noise, and Brownian motion) 05.45.Tp (Time series analysis) 05.90.+m (Other topics in statistical physics, thermodynamics, and nonlinear dynamical systems)