† Corresponding author. E-mail:
Project supported by the National Natural Science Foundation of China (Grant No. 11075184), the Knowledge Innovation Program of the Chinese Academy of Sciences (CAS) (Grant No. Y03RC21124), and the CAS President's International Fellowship Initiative Foundation (Grant No. 2015VMA007).
Laser-induced breakdown spectroscopy (LIBS) is a versatile tool for both qualitative and quantitative analysis. In this paper, LIBS combined with principal component analysis (PCA) and support vector machine (SVM) is applied to rock analysis. Fourteen emission lines including Fe, Mg, Ca, Al, Si, and Ti are selected as analysis lines. A good accuracy (91.38% for the real rock) is achieved by using SVM to analyze the spectroscopic peak area data which are processed by PCA. It can not only reduce the noise and dimensionality which contributes to improving the efficiency of the program, but also solve the problem of linear inseparability by combining PCA and SVM. By this method, the ability of LIBS to classify rock is validated.
The laser-induced breakdown spectroscopy is a straightforward atomic emission spectroscopic technique that can provide rapid, multi-element detection with simple sample preparation. The laser-induced breakdown spectroscopy was first applied to the detection of hazardous gases and vapors in air at Los Alamos in the 1980s. Many researchers focused on the theory of laser-induced plasma (LIP),[1,2] electron density,[3,4] and laser–matter interaction which could improve the performance of LIBS.[5–8] Owing to its versatile advantages,[9] LIBS has been widely used in many fields from laboratory to practical applications such as industrial control,[10–13] environment protection,[14–17] agriculture,[18–20] medical science,[21,22] and archaeology.[23–25] Besides, LIBS has a trend of miniaturization, multiuse,[26] and using femtosecond laser to induce the plasma with some special characters.[27] Recently, qualitative analysis with chemometrics and pattern recognition has been conducted by many LIBS investigators and satisfactory results have been obtained by their work. In 2007, Sirven et al.[28] did many studies on the feasibility of rock identification at the surface of mars by remote laser-induced breakdown spectroscopy. They employed three chemometric methods: principal component analysis (PCA), soft independent modeling of class analogy (SIMCA), and partial least squares-discriminant analysis (PLS-DA) and obtained a best classification accuracy of 97.6%. In 2009, Yueh et al.[29] studied LIBS applied to the classification of biological samples by using 31 spectral lines of peak intensity, and they also gained good results by three methods: PLS-DA, hierarchical cluster analysis (HCA), and artificial neural network (ANN). In 2012, Cisewski et al.[30] built a support vector machine (SVM) model to evaluate the composition of suspect powders, particularly with respect to a possible content of Bacillus anthracis and obtained promising results.[30] De Lucia and Gottfried analyzed the explosive residues on organic substrates using LIBS data.[31] In China, many researchers, e.g., Yu et al.,[32] Tian et al.,[33] Wang et al.,[34] and Chen et al.[35] have contributed to LIBS for qualitative analyses through various methods such as PCA, PLS-DA, self-organizing mapping (SOM), and SVM. But the SVM combined with PCA has not been found to be applied to the LIBS data. In fact, the PCA is employed as a way of pretreating data and extracting features by many methods such as PLS-DA and SIMCA. This paper aims to classify rocks by LIBS data analyzed by SVM combined with principal component analysis.
The LIBS process involves many complex but independent areas such as laser–matter interaction, laser ablation of material, optical and thermodynamic properties of hot and ionized gas, and plasma propagation in a background gas.[5] When the plasma is in local thermal equilibrium (LTE) and there is no self-absorption, the emission intensity is given by
PCA is often employed because it enables the decreasing of noise and dimensionality. The main ideal is that the raw data
The SVM has many applications in statistics, in particular for classification. The main idea of SVM is to find the hyperplane that can best distinguish the data by maximizing the margin between the closest points in each class.[43] Considering that the raw data may be nonlinear, the kernel function of the radial basis function (RBF) is chosen as:
The classic experimental setup is adopted to acquire the spectral data as shown in Fig.
The Nd:YAG laser (Quantel model ULTRA50) is operated at the fundamental wavelength of 1064 nm. The laser pulse energy is fixed at 50 mJ, pulse width is 7ns, and the repetition frequency is 0.5 Hz. The laser is reflected by a 45° mirror first, and focused on the sample surface by an optical lens (lens 1) with 100-mm focal length. When the laser power density exceeds the ablation threshold of the material (typical 107 W/cm2), a plasma is formed, emitting dazzling light which contains continuous background light and characteristic spectral lines from energy level transitions. The focused spot diameter is (0.75018±0.06931) mm measured by Abbe comparator (Shanghai model 6W810115) for 5 different spots. Thus the laser power density can be calculated to be 1.26 × 109 W/cm2. The emitting light is collected by a spectrometer (Avantes model AVS-DESKTOP-USB2) through an optical lens (lens 2) and a fiber. The spectrometer range is between 180 nm and 610 nm with a spectral resolution of about 0.05 nm. From the experiences of our group, the spectral integral time of the experiment is set to be 1.03 ms, and the delay time between the laser pulse and the collection of the emission is set to be 1.05 μs.
The experiment is performed with nine types of rock debris (Amphibolite, Gneiss, Limestone, Muddy Limestone, Shale, Quartz Sandstone, Basalt, Andesite and Granite). Figure
The data selected from the total emission spectrum are the peak areas which are calculated from the spectra directly rather than the peak intensities of the 14 spectral lines. So the spectrometer does not need high spectral resolution. As an example, the intensity of the Ca line at 585.74 nm corresponds to 4956.60 a.u. (a.u. is short for arbitrary unit) as shown by the measurement in Fig.
After the spectral peak areas are calculated, the data are normalized to reduce fluctuations caused by laser energy, irregularity of the sample surface and interference from outside. For each sample, the normalization is
PCA yields valuable information about the rock debris. At first it is necessary to determine the number of PCs. The first three PCs have explained over 90% of the variance of the original data and the fourth PC explained only 4.83% (see Table
As figure
Figure
To overcome the defect of PCA, the SVM is employed to classify the nine rock debris based on the result of PCA. In addition, the RBF kernel function is chosen because of its ability to deal with nonlinear data. Five samples selected from each kind of rock debris are divided into three training samples and two test samples randomly. The first three PCs are chosen as the input data of SVM. To obtain a steady and reliable model, k-fold cross validation is operated to optimize the model. Two factors C and σ2 vary from −1024 to +1024 in steps of 0.01 and from −100 to +100 in steps of 0.01 respectively. By this progress, the best optimized penalty coefficient C = 1024 and the RBF variable σ2 = 2.46 are gained. The classification result is shown in Fig.
It can be seen from Fig.
The whole program takes a few minutes (430.36 s), in which cross validation occupies much time (429.68 s, 99.84% of the total time). However, we should also realize that the model built by the training samples has been over trained. This can be derived from the penalty coefficient C of 1024 which is the highest among the numbers set by the program. The main reason is that the 14 selected emission lines cannot represent the biggest difference of the different kinds of rocks. The same situation happens during the identification of plastics.[32] The SVM is a supervised learning, so the training model is the key which determines the program performance. If the training samples exhibit much noise, SVM must take a large penalty coefficient to classify them. Hence the noise reduction of the input data achieved by PCA keeps the training model more practical. The selection of the analytical emission lines plays a fundamental role in the training.
LIBS is demonstrated to be feasible for the analysis of rock during geological exploration due to its versatile advantages. The peak area instead of peak intensity is a suitable way to acquire useful data from LIBS spectra. For rock debris, the classification accuracy is 100%, while for real rock the accuracy decreases to 91.38%. LIBS yields rich and valuable spectral information. By combining PCA and SVM, it can not only reduce the noise and dimensionality which is helpful to improve the efficiency of the program, but also solve the problem of linear inseparability. This method can be adapted to classify the samples which have multivariate characters with not much noise. The crucial question is how to select the most appropriate emission lines for the analysis which can represent for the biggest difference among the different kinds of rocks in order to build a more practical model. Here we employ 14 analysis lines selected by the prior knowledge of the elements contained in the rock material.
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
13 | |
14 | |
15 | |
16 | |
17 | |
18 | |
19 | |
20 | |
21 | |
22 | |
23 | |
24 | |
25 | |
26 | |
27 | |
28 | |
29 | |
30 | |
31 | |
32 | |
33 | |
34 | |
35 | |
36 | |
37 | |
38 | |
39 | |
40 | |
41 | |
42 | |
43 | |
44 |