Chin. Phys. B, 2020, Vol. 29(11): 116103    DOI: 10.1088/1674-1056/abc0e3
Special Issue: SPECIAL TOPIC — Machine learning in condensed matter physics
TOPICAL REVIEW—Machine learning in condensed matter physics Prev   Next  

Machine learning in materials design: Algorithm and application

Zhilong Song(宋志龙), Xiwen Chen(陈曦雯), Fanbin Meng(孟繁斌), Guanjian Cheng(程观剑), Chen Wang(王陈), Zhongti Sun(孙中体), and Wan-Jian Yin(尹万健)
College of Energy, Soochow Institute for Energy and Materials InnovationS (SIEMIS), and Jiangsu Provincial Key Laboratory for Advanced Carbon Materials and Wearable Energy Technologies, Soochow University, Suzhou 215006, China

Traditional materials discovery is in ‘trial-and-error’ mode, leading to the issues of low-efficiency, high-cost, and unsustainability in materials design. Meanwhile, numerous experimental and computational trials accumulate enormous quantities of data with multi-dimensionality and complexity, which might bury critical ‘structure–properties’ rules yet unfortunately not well explored. Machine learning (ML), as a burgeoning approach in materials science, may dig out the hidden structure–properties relationship from materials bigdata, therefore, has recently garnered much attention in materials science. In this review, we try to shortly summarize recent research progress in this field, following the ML paradigm: (i) data acquisition → (ii) feature engineering → (iii) algorithm → (iv) ML model → (v) model evaluation → (vi) application. In section of application, we summarize recent work by following the ‘material science tetrahedron’: (i) structure and composition → (ii) property → (iii) synthesis → (iv) characterization, in order to reveal the quantitative structure–property relationship and provide inverse design countermeasures. In addition, the concurrent challenges encompassing data quality and quantity, model interpretability and generalizability, have also been discussed. This review intends to provide a preliminary overview of ML from basic algorithms to applications.

Keywords:  machine learning      materials design      structure-property relationship      active learning  
Received:  06 July 2020      Revised:  24 August 2020      Accepted manuscript online:  14 October 2020
Fund: Project support by the National Natural Science Foundation of China (Grant Nos. 11674237 and 51602211), the National Key Research and Development Program of China (Grant No. 2016YFB0700700), the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), China, and China Post-doctoral Foundation (Grant No. 7131705619).
Cite this article: 

Zhilong Song(宋志龙), Xiwen Chen(陈曦雯), Fanbin Meng(孟繁斌), Guanjian Cheng(程观剑), Chen Wang(王陈), Zhongti Sun(孙中体), and Wan-Jian Yin(尹万健) Machine learning in materials design: Algorithm and application 2020 Chin. Phys. B 29 116103

Fig. 1.  

The main workflow of traditional supervised learning and active learning.

Name Function URL
Pymatgen Robust, open-source python library for materials analysis
AFLOW π A minimalist framework for high-throughput first principles calculations
FireWorks An open-source code for defining, managing, and executing calculation workflows
AiiDA A workflow to automate complex numerical procedures of calculation
Pymatflow A workflow simplifier for research on materials science by means of ab initio simulation
ASE Setting up, steering, and analyzing atomistic simulations
Atomate Built on top of state-of-the-art open-source libraries: pymatgen, custodian, and FireWorks
Custodian A simple, robust, and flexible just-in-time (JIT) job management framework
MPInterfaces A python tool that enables high throughput analysis of interfaces using VASP, VASPsol, and MP tools
Imeall A database framework for the calculation of the atomistic properties of grain boundaries
Pylada A modular python framework to control physics simulations
Pyiron An integrated development environment (IDE) for computational materials science
Table 1.  

HT tools.

Name Data type URL Free
Materials Project Multiple
ICSD Inorganic & Experimental ×
AFLOWLIB Inorganic & Computational
COD Multiple & Experimental
QM9 Organic molecules
OMDB-GAP1 Organic crystals
OQMD Multiple & Computational
NOMAD Multiple
Materials Cloud Multiple (3D, 2D)
NREL Materials Computational & Renewable
Clean Energy Project Solar cell ×
TEDesignLab Thermoelectric
HTEM Inorganic
Supercon Superconducting
MaterialsWeb 2D
CSD Multiple ×
CMR Multiple (3D, 2D)
Citrination Multiple
MatNavi Multiple
MatWeb Engineering
GDB Small organic molecules
ZINC Compounds
ChEMBL Bioactive molecules
ChemSpider Multiple
Materials Commons Computational
AiiDA Alloy Phase Diagram
ASM Inorganic ×
LPF Multiple ×
PCD Multiple ×
Nano-HUB Nanomaterials
EELS Data Base Spectra
XAFS database Spectra
Table 2.  

Material databases.

Name Description URL
QML A python toolkit for representation learning of properties of molecules and solids
AMP A modular approach to machine learning in atomistic simulations
Magpie Materials-agnostic platform for informatics and exploration
RDkit A collection of cheminformatics and machine-learning software written in C++ and Python
ChemML A ML program suite for the analysis, mining, and modeling of chemical and materials data
DScribe Library of descriptors for machine learning in materials science
Matminer A Python library for data mining the properties of materials
SchNet A deep learning architecture for quantum chemistry
DeepChem Deep-learning models for drug discovery and quantum chemistry
MEGNet An implementation of DeepMind’s graph networks for universal machine learning in materials science
CGCNN Implement crystal graph convolutional neural networks to arbitrary crystal structures
Table 3.  

Feature tools.

Fig. 2.  

A simple fully connected neural network structure with two hidden layers.

Fig. 3.  

A 2D CNN structure with two max-pooling layers and one convolution layer.

Fig. 4.  

The sketch of the methodology of RNN.

Fig. 5.  

The workflow of the SR program.

Fig. 6.  

The structure of the AI Feynman. Reprinted with permission from Ref. [137].

Fig. 7.  

The mechanism of the BO method. Reprinted with permission from Ref. [150]. Copyright (2020) Springer Nature.

Fig. 8.  

Four main parts of MCTS.

Fig. 9.  

Typical learning curve, y axis refers to the value of loss function, and x axis is the number of examples.

Fig. 10.  

(a) The workflow of classification of XRD data with data augmentation method. (b) The structure of CNN model they used. Reprinted with permission from Ref. [2].

Conventional Deep learning Tree-based Descriptor Active Unsupervised
Thermodynamic stability [86,188193] [26] [188–[190,194196] [195] [86]
Band gap [31,193,197204] [20,203,205] [31,196,201,204] [33] [206]
Superconductivity [207210] [211,212] [19,213,214] [215] [207,210,213]
Thermal conductivity [79,216221] [79,218,222,223] [79,224227] [216,222,228230] [216]
Curie temperature [6,231236] [235237] [6] [233,236]
Bulk and shear moduli [238242] [97,102,243] [25,240,244246] [246]
Debye temperature and heat capacity [239,242,247,248] [25,248]
Density of states [87,249,250] [251] [249,252]
Dielectric breakdown strength [23,253255] [255] [253]
grain boundary structure and properties [256259] [260] [257,258] [261,262] [263]
Lattice parameter [264266] [266]
Lithium ion batteries SOC and conduction [22,267274] [275277] [22,273,278]
melting temperature [221,279282] [279] [279]
Table 4.  

ML application for some materials properties.

Fig. 11.  

The main workflow of screening stable halide perovskites via ML in combination with DFT calculations. Reprinted with permission from Ref. [192]. Copyright (2019) John Wiley and Sons.

Fig. 12.  

(a) The search progress for the halide perovskites with ideal decomposition energy and band gap. (b) The performance of BO. Reprinted with permission from Ref. [150]. Copyright (2020) Springer Nature.

Fig. 13.  

Workflow of screening semiconductors from the MXene database and predicting band gaps. Reprinted with permission from Ref. [200]. Copyright (2018) American Chemical Society.

Fig. 14.  

(a) The representation matrix of doped graphene supercell systems. (b) One of the CNN structures to predict band gaps. Reprinted with permission from Ref. [20].

Fig. 15.  

(a) Prediction on testing set of low-Tc, iron-based, and cuprate superconductors. (b)–(c) Prediction from model trained on data only containing low-Tc materials. (d)–(e) Prediction from model trained on data only containing cuprate materials. Reprinted with permission from Ref. [19].

Fig. 16.  

(a) The prediction result for ITR from LSBoost. (b) The relation between ITR prediction and the thickness and temperature. Reprinted with permission from Ref. [226]. Copyright (2018) American Chemical Society.

Fig. 17.  

The workflow of MCTS for optimizing the surface roughness. Reprinted with permission from Ref. [229].

Fig. 18.  

(a) The design loop for searching high-temperature ferroelectric perovskites. Reprinted from Ref. [6]. (b) The relation between the Tc and chemical composition for the ternary system Al–Co–Fe. Reprinted with permission from Ref. [236]. Copyright (2019) American Physical Society.

Fig. 19.  

The workflow of the work of Raccuglia[78] et al. Copyright (2016) Springer Nature.

Bian C, He H, Yang S 2020 Energy 191 116538
