Residual occurrence and energy property of proteins in HNP model*
Jiang Zhou-Tinga), Dou Wen-Huia), Shen Yub), Sun Ting-Tingc), Xu Penga)
Department of Applied Physics, China Jiliang University, Hangzhou 310018, China
Department of Applied Physics, Zhejiang University of Science and Technology, Hangzhou 310023, China
College of Information and Electronic Engineering, Zhejiang Gongshang University, Hangzhou 310018, China

Corresponding author. E-mail: z.jiang@cjlu.edu.cn

*Project supported by the National Natural Science Foundation of China (Grant Nos. 21204078, 11304282, and 11202201) and the Natural Science Foundation of Zhejiang Province, China (Grant No. LY12B04003).

Abstract

Four categories of globular proteins, including all- α, all- β, α + β, and α/ β types, are simplified as the off-lattice HNP model involving the secondary-structural information of each protein. The propensity of three types of residues, i.e., H, N, and P to form a secondary structure is investigated based on 146 protein samples. We find that P residues are easy to form α-helices, whereas H residues have a higher tendency to construct β-sheets. The statistical analysis also indicates that the occurrence of P residues is invariably higher than that of H residues, which is independent of protein category. Changes in bond- and non-bonded potential energies of all protein samples under a wide temperature range are presented by coarse-grained molecular dynamics (MD) simulation. The simulation results clearly show a linear relationship between the bond-stretching/bending potential energy and the reduced temperature. The bond-torsional and non-bonded potential energies show distinct transitions with temperature. The bond-torsional energy increases to the maximum and then decreases with the increase of temperature, which is opposite to the change in non-bonded potential energy. The transition temperature of non-bonded potential energy is independent of the protein category, while that of bond-torsional energy is closely related to the protein secondary structure, i.e., α-helix or β-sheet. The quantitatively bonded- and semi-quantitatively non-bonded potential energy of 24 α + β and 23 α/ β protein samples are successfully predicted according to the statistical results obtained from MD simulations.

Keyword: 68.35.bm; HNP model; molecular dynamics simulation; residue hydrophobicity
1. Introduction

One of the most crucial and still unclear problems in biophysics is how a protein folds into its native three-dimensional structure from a linear amino acid sequence. The biological function of a protein strongly depends on its unique structure determined by complicated interactions among residues. Different potentials among residues are given by varying scales of residue hydrophobicity, the presence of partial charges or hydrogen-bonding groups or hydropgen-bonding groups.[1] It is well established, the primary structural information, i.e., the sequence of amino acid is sufficient to determine the tertiary structure according to the global minimum on free-energy landscape of a protein.[2] However, in the presence of additional protein chains or in the crowded cellular environment, the misfolding and aggregation of non-native protein structures are observed.[3, 4] Then, determining the three-dimensional structure of a protein molecule from its amino acid sequence remains difficult because the energy that stabilizes the folded conformation is poorly understood and the number of possible structures is extensive.[5, 6]

Computational simulation is a powerful method to discover the detailed mechanism of structural information at an atomistic level. Nowadays, fruitful researches have been carried out in the field of structural formation of macromolecular systems by computer simulation to overcome experimental difficulties.[712] Several studies have been carried out in the crystallization of polymer chains and the processes of DNA condensation by molecular dynamics (MD) simulations.[13, 14] Despite the rapid progress of computer power, numerous basic questions, such as the nature of the interactions involved in protein folding, the thermodynamic states involved as a protein folds, and the structural properties of each thermodynamics state remain unanswered.[15] At present, the statistical analysis of amino acid sequences is regarded as another important method to understand the structure of proteins, which attempts to provide a deep insight into fundamental biology and a new idea for combating protein misfolding diseases.[1620]

Proteins, with numerous freedom degrees, show considerable structural diversity. Therefore, the simulation on realistic all-atom representations of a protein limits the running time because of computational expense. To resolve this problem, the simplified representations and energy functions of real proteins have been presented.[1, 5, 1924] One of the widely used simplified lattice models is the HP model, which was first proposed by Dill.[21] In the HP model, two beads named H (hydrophobic residues) and P (polar residues) exist. The hydrophobic energies are attractive, i.e., H– H contacts mainly contribute to the energetics in the protein folding process and all the other groups are treated as zero. Recently, protein engineering experiments suggest that instead of two, three kinds of residues can be effectively substituted for the twenty amino acids.[25] According to this pattern, Miyazawa– Jernigan (MJ) modified the interaction matrix.[26] Therefore, in the modified HNP off-lattice model, the two kinds of residues in the standard HP lattice model are replaced by three kinds of residues, namely H (hydrophobic residues), N (neutral residues), and P (polar residues).[19, 20, 24, 25] The nonlocal interaction term is residue type dependent. The interaction between H– H residues has a deep primary minimum that gives rise to defining folded structures. The other residue pair interactions are always repulsive with different strength. The protein is composed of residues with Cα at the center. The backbone of the protein constituted by simplified residues in a continuous space instead of a cubic lattice, have been studied in the literatures.[24, 27, 28] While lacking the chemical detail of all-atom representations, these simplified models are amenable to characterize both kinetics and thermodynamics during the protein folding process.

Globular proteins are categorized into four structural classes, namely all-α , all-β , α + β , and α /β proteins, based on the topology and content of its secondary structure, i.e., α -helices and β -sheets.[29] The present article reports the simulation results carried out by the MD method on a single protein molecular, which is simplified as a modified HNP model involving its secondary structure and performed in the continuous space. A total of 146 protein samples divided into four types are adopted. Besides the occurrence of residues, especially in the secondary regions, the properties of the potential energy of each protein under its equilibrium states are discussed in detail. These results help us to abstract “ essential physical properties” and predict the energy expression according to its sequence of residues.

2. Samples and methods
2.1. Database

A total of 146 globular proteins classified by the SCOP database, [30] including 50 all-α , 49 all-β , 24 α + β , and 23 α /β proteins, were selected as the sample source for our present work. The amino acid sequence and secondary structure of each protein are obtained from the Protein Date Bank (PDB). Three types of amino acids, namely hydrophobic residues (H) consisting of Phe, Met, Leu, Ile, Trp, Cys, Tyr, and Val, neutral residues (N) consisting of Gly, His, Thr, and Ala, and polar residues (P) consisting of Asn, Lys, Asp, Ser, Glu, Gln, Arg, and Pro, are adopted by the MJ matrix and other literatures.[19, 20, 24, 25, 31] Then, the protein is represented by a chain containing 3 types of beads. Each bead in a protein molecule is represented by the position of an α -carbon atom (Cα ). The PDB codes, chain length, and the components (i.e., the number of residues categorized by hydrophobic scale and secondary structure) for all protein samples used in this present article are given in Table  1.

Table 1. PDB codes and structural information of globular proteins classified as all-α , all-β , α + β , and α /β types.
2.2. Simulation model and method

To minimize the running time during simulation processes on numerous protein samples and show the general properties of globular proteins instead of a specific one, the modified off-lattice minimalist HNP model, in which a single protein chain is described as a sequence of three types of beads H, N, and P, is selected in this article. The beads interact via the non-bonded potentials, and bonded potentials including bond-stretching, bond-bending and bond-torsional potentials. The coarse-grained force field used here is the same as that in the literature.[24, 32] The potential energy functions adopted in our work can be thought of as a simple version used in the study of the structure and dynamics of real proteins. The four potential functions of a protein are as follows.

(I) The non-bonded interaction involves beads that are at least four residues apart in the amino acid sequence. For two beads, the non-bonded interaction is given by the truncated 12– 6 Lennard– Jones potential

where ri j is the distance between beads i and j, ε is the parameter that defines the energy scale, ε sets the strength of the nonlocal interaction and Λ sets the nonlocal interaction that is attractive or repulsive. Both of these two parameters depend upon the nature of the two beads involved (see Table  2); σ is the length parameter representing the equilibrium bond length in Eq.  (2).

Table 2. Parameters ε and Λ in the non-bonded interaction term of the potential function depending on the nature of the two residues involved.

(II) The bond-stretching potential energy models the covalent bond connecting adjacent residues as a spring with an equilibrium value of σ rather than as a rigid rod. It takes the form

where ri is the bond length between beads i − 1 and i. The bond-stretching potential energy constant is expressed as kb = 100ε h/σ 2.

(III) The bond-bending potential energy models the harmonic potential to constrain the bond angle at equilibrium angle θ 0 between three successive beads

where the equilibrium bond angle θ 0 = 1.8326  rad and θ i is the bond angle between three beads i − 2, i − 1, and i. The bond-bending potential energy constant is kθ = 13.33ε h· rad− 2.

(IV) The bond-torsional potential energy describes the rotation of the three bonds that connect four successive residues in the amino acid sequence

where ϕ α = 1.0  rad, and ϕ β = π   rad, ϕ i is the dihedral angle of two planes consisting of three successive bonds. The parameters A and B vary with different secondary regions of the protein. For α -helices: A = 6.0ε h, B = 5.6ε h. For β -sheets: A = 5.6ε h, B = 6.0ε h. For other cases: A = B = 0.

The same as the previous work, the simulation results are presented in terms of reduced units.[33] Here, the reduced temperature T* = KBT/ε , the reduced energy E* = E/ε , and the reduced time . A single protein chain exposed to a vacuum is simulated by the MD method. MD simulations are executed from the random configuration of the protein. The Nose– Hoover method is applied to keep the constant temperature of the system.[34, 35]

3. Results and discussion
3.1. Residual occurrence in the secondary regions

Firstly, the ability of the three types of residues to form secondary structures, such as α -helix or β -sheet, is investigated based on 146 protein samples. The statistic results on the relationship between the residue types H, N, and P, and secondary structures such as α -helix, β -sheet or others, are listed in a 3 × 3 matrix (see Table  3(a)). A total of 23992 residues are found in 146 protein samples. The number of H, N, and P residues is 7915, 6026, and 10051, respectively. Then, the frequency of occurrence is easily obtained, i.e., H (33%), N (25%), and P (42%). In the HNP model, 20 amino acids are replaced by three kinds of residues. Hydrophobic residues (H) and hydrophilic residues (P) both consist of 8 types of amino acids, whereas neutral residues (N) have the half number of residue types compared with H or P. Based on the theory of statistics, the frequency of the occurrence of H and P should be equal, which is double that of N residues. It shows that the occurrence of N residues, which consists of only 4 kinds of amino acids, is common among the proteins. In 146 protein samples, 8586 and 5780 residues participate in forming α -helices or β -sheets, respectively. Both numbers are smaller than 9629 that the number of residues excluded from the secondary structure is neither α -helix nor β -sheet. More detailed data are shown in Table  3(a) and Fig.  1. For example, α -helical structures consist of 8586 residues, including 2964 H, 1955 N, and 3667 P residues, respectively. Among 5780 residues, which as the fundamental elements of β -sheets, H type presented a maximum value of 2665. These results conclude that P residues are found more frequently in an α -helix, meanwhile, H ones present the higher tendency to construct β -sheet secondary-structural elements. Previously, several groups have derived a helix propensity scale based on studies of the protein stability by experimental or simulation methods.[36, 37] In contrast, the propensity for β -sheet formation of 20 naturally occurring amino acids is also measured in the literature.[38] According to the helix propensities of amino acids achieved by Pace and Scholtz, the average value of helix propensities of H residues is 0.464  kcal/mol, which is higher than that of P type residues, i.e., 0.443  kcal/mol.[36] Thus, P residues are favorable to the occurrence of α -helix secondary structures.

Table 3. Residual occurrence in the secondary regions based on all protein samples (a), and four types of globular proteins (b), respectively.

Fig.  1. Histogram of Residual occurrence. Twenty residues are simplified as three types of beads, i.e., H, N, and P according to the MJ matrix in Ref.  [26]. The occurrences of H, N, and P are obtained from the statistical results in 146 protein samples listed in Table  1. The sequence and secondary structures of each protein can be downloaded from http://www.pdb.org.

The data were also analyzed according to four structural classes of globular proteins, i.e., all-α , all-β , α + β , and α /β proteins, and the results are shown in Table  3(b). For 50 all-α proteins, α -helices dominate at its secondary-structural level with 4726 residues, which is 55.6 times larger than ones that construct β -sheets. For all-β proteins, 4196 residues participate in the β -sheet folds. Although nearly 4.33 times larger than the number of residues forming an α -helix, it is even less than the number of residues, which construct other secondary structures except α -helices and β -sheets. In four such protein categories, except for all-β proteins, the residues forming α -helices always exceed the ones forming β -sheets. In α + β and α /β proteins, the difference of residue number between constructing α -helices and β -sheets is within 2.2 times, which is smaller than the condition of all-α or all-β proteins. The all-α and all-β proteins consist almost exclusively of helical and β -sheet secondary structures, respectively, whereas the α + β and α /β classes contain varying degrees of both secondary-structural elements. Helics and sheets are spatially segregated in α + β proteins. In the case of α /β proteins, α -helices and β -sheets alternate typically, allowing the β -sheets to organize in a parallel fashion. In these four matrices arranged by four protein categories, the number of P residues is always larger than that of H residues, although both of them consist of eight different amino acids. The number of N type residues is the minimum, as it just includes four different amino acids. However, the proportion of occurrence remains higher than 4/20. These results verify that the residual composition of proteins is independent of protein category. It totally conforms to the discussion in a previous paragraph.

3.2. Potential energy property of four protein categories

The temperature dependence of the potential energy of each protein sample is investigated. Figures  2(a)– 2(d) show the simulation results of PDB code 1FCS, 2MCM, 9RNT, and 3CHY proteins classified as all-α , all-β , α + β , and α /β , respectively. The chain length, residual composition, and secondary-structural information are listed in Table 1. These figures show that: (i) the values of bond-stretching/bending potential energies are always larger than 0 under any simulation temperatures. Both energies are linearly increasing with the temperature in the four protein samples belonging to different protein categories. The values of bond-stretching and bond-bending potential energy approximate each other, especially when T* = 0.3. When T* > 0.3, the bond-stretching energy is slightly higher than the value of bond-bending energy, regardless of protein categories. On the contrary, when T* < 0.3 the bond-stretching energy is slightly lower than the one of bond-bending energy. It indicates that the slope of bond-stretching potential energy as a function of the reduced temperature is larger than that of bond-bending energy. (ii) Bond-torsional potential energy is the lowest one among the four components of potential energy in any simulation case. According to Eq.  ( 4), bond-torsional potential energy is invariably lower than 0. Bond-torsional energies increase initially and then decrease with increasing temperature. The difference of bond-torsional potential energies shown in the four figures is the transition temperature shifting between T* = 0.2 and T* = 0.3. (iii) The trend of non-bonded interaction with the change of temperature is opposite to that of bond-torsional energy, i.e., non-bonded interaction evidently decreases and then increases to a stable value when the reduced temperature remains escalating. In the four figures, the non-bonded potential energies are always lower than 0. According to Eq.  (1) and parameter values in Table  2, we can easily conclude that only the interaction between H– H residues present a prospectively negative value. In the other cases the residue pair interactions are always repulsive with different strengths. Therefore, H– H interactions perform the main functions in achieving the equilibrium state of each protein.

Fig.  2. The total and components of reduced potential energy versus reduced temperature T* of four types of globular proteins. (a) 1FCS classified as all-α , (b) 2MCM classified as all-β , (c) 9RNT classified as α + β , and (d) 3CHY classified as α /β protein, respectively.

From the perspective of protein categories, these figures also show that: (i) four categories of protein indicate the same tendencies of bond- and non-bonded potential energy with variable temperature. (ii) The transition temperature of a non-bonded interaction is the same at T* = 0.2. (iii) The difference among all-α , all-β , α + β , and α /β proteins is the transition temperature of bond-torsional potential energy. For all-α protein 1FCS and α /β protein 3CHY, the transition temperatures of bond-torsional energy are the same at T* = 0.3. In addition, in the case of all-β protein 2MCM and α + β protein 9RNT, the transition temperatures of bond-torsional energy are both at T* = 0.2. (iv) The change in total potential energy as a function of temperature is related to four parts of bond- and non-bonded potentials. Taking 1FCS as an example, figure  2(a) shows that the non-bonded potential energy of 1FCS is the main contributor during energy transition when 0.1 < T* < 0.2, because the total potential energy presents the same tendency as that of the non-bonded energy. In the temperature scope 0.2 < T* < 0.6, the total potential energy increases evidently with increasing temperature. The increasing tendencies are dominated by bond-stretching, bond-bending, and non-bonded interaction. When 0.6 < T* < 1.0, the total potential energy slightly increases with the temperature, and the major parts in the function of total potential energy are attributed to bond-stretching and bond-bending energies. According to the discussion mentioned above, we can conclude that the change in total potential energy with the system temperature corresponds to the competition between bond- and non-bonded interactions.

3.3. Comparison between simulation results and predicted values

To obtain the general rules on the energy transition of proteins, MD simulations on 146 protein samples have been performed under 0.1 < T* < 1.0. According to Eqs.  ( 2) and (3), the bond-stretching and bond-bending potential energies are both linearly proportional to the protein chain length, and independent of secondary structure or protein category. Based on the view of thermodynamics, an amplitude of the variation of bond length/angle is quantified under the equilibrium state, i.e., or is independent of the protein at a certain temperature. Therefore, the total bond-stretching/bending potential energy of the protein chain is the summation of all bonds. In the present study, the bond-stretching/bending energies of each bond at a certain temperature are calculated by averaging simulation results on 50 all-α and 49 all-β proteins (protein No. from 1 to 99 listed in Table  1), and the statistical results are shown in Fig.  3(a). The figure shows a strong linear relationship between the bond-stretching/bending potential energy per bond and the reduced temperature. Based on the theorem of equipartition of energy, the average energy associated with each independent degree of freedom is linearly proportional to the system temperature, which can explain the straight lines in Fig.  3(a). On the other hand, the slope of bond-stretching energy per bond as the function of temperature is 0.53, which is larger than 0.45, which is the slope of bond-bending energy. At T* = 0.3, the values of bond-stretching and bond-bending energy per bond are exactly the same. This result is in good agreement with the previous discussion in Section 3.2, in which the bond-stretching and bond-bending potential energies of 1FCS, 2MCM, 9RNT, and 3CHY proteins are all the same at T* = 0.3 in Fig.  2.

Fig.  3. Statistical and predicted values of bond-stretching and bond-bending potential energy. (a) Bond-stretching (• ) and bond-bending (▲ ) potential energy of each bond versus reduced temperature T* based on simulation results from 99 all-α and all-β proteins, (b)  simulated and predicted values of bond-stretching potential energy of 47 other proteins at T* = 0.5, (c) simulated and predicted values on bond-bending potential energy of 47 other proteins at T* = 0.5.

The bond-stretching and bond-bending potential energies of each bond as a function of temperature are obtained by the statistical results on 99 protein samples, which are independent of residue type and protein category. Therefore, the values of bond-stretching/bending potential energy of any protein can be predicted with a certain chain length. Figures  3(b) and 3(c) show the bond-stretching and bond-bending potential energy of 24 α + β and 23 α /β protein samples (protein No. from 100 to 146 listed in Table  1), respectively, by comparing simulated and predicted results at T* = 0.5. The extremely close values of bond-stretching/bending potential energy obtained by simulation or the prediction method are shown in these two figures. The correlation coefficient R is higher than 0.999, which indicates the nearly perfect agreement is achieved.

Similar to the analysis of bond-stretching/bending potential energy, the statistical result of bond-torsional potential energy per bond as a function of temperature is shown in Fig.  4(a). As the expression of bond-torsional potential energy is not the square of the independent degree of freedom, then the curve of bond-torsional energy per bond shows a nonlinear relationship with the temperature. Equation  (4) also shows that the bond-torsional potential energy involves the addition to the residues in its secondary regions. The parameters in Eq.  (4) are different according to the different secondary structures which the residues participate in. Here, the statistical results in Fig.  4(a) are obtained from 38 all-α and 9 all-β protein samples that their secondary-structural regions exclusively contain α -helices and β -sheets, respectively. In Fig.  4(a), the change of two curves corresponding to the α -helix and β -sheet with variable temperature is similar, i.e., the value of bond-torsional potential energy of each bond gradually increases to the maximum and then decreases with the increasing temperature. The differences between the bond-torsional energy per bond of α -helix and β -sheet include two aspects. First, bond-torsional energy per bond of the α -helix is lower than that of the β -sheet at any temperature, indicating that the α -helical structure is more stable than the β -sheet. Second, in case of the α -helix, the transition temperature corresponding to the maximum value of bond-torsional potential energy occurs at T* = 0.3, which is higher than that of the β -sheet at T* = 0.2. According to the statistical results shown in Fig.  4(a), the value of bond-torsional potential energy of any protein that even contains varying degrees of both secondary-structural elements can be predicted. For example, when T* = 0.5, the statistical values of bond-torsional energy per bond of the α -helix and β -sheet are − 3.939 and − 3.372, respectively. Then, the bond-torsional potential energy of each protein chain at T* = 0.5 is calculated as − 3.939Nα − helix − 0.372Nβ − sheet. The values of Nα − helix and Nβ − sheet are the number of residues as the elements of the α -helix and β -sheet, respectively. Figure  4(b) shows the bond-torsional potential energy of 24 α + β and 23 α /β protein samples, which include both helices and sheets at its secondary structure level at T* = 0.5. The values of bond-torsional potential energy by simulated and predicted methods are closely approximated to each other, and the correlation coefficient is R = 0.995. This result also verified the prediction is reasonable and effective in the study of bond-potential energy of proteins. The correlation coefficient R increases with the increasing temperature, which indicates the relatively small error between simulated and predicated bond-potential energy at high temperature.

Fig.  4. Statistical and predicted values of bond-torsional potential energy. (a) Bond-torsional potential energy of each bond with the residues constructing α -helices (▲ ) or β -sheets (▼ ) versus reduced temperature T* based on simulation results from 99 all-α and all-β proteins, (b) simulated and predicted values of bond-torsional potential energy of other 47 proteins at T* = 0.5.

The non-bonded potential energy involves two body interactions that are at least four residues apart in the protein sequence. In the HNP model, 20 amino acids are classified as three types by its hydrophobic property. According to the discussion mentioned in Section 3.2, the interactions between H– H residues contribute significantly in stabilizing the configuration of a protein. Then, the non-bonded potential energy of each protein sample under variable temperature is simply divided by possible H– H pairwise number NH(NH − 1)/2, and the results are shown in Fig.  5(a). Here, NH is the number of H type residues of the protein. The curve of non-bonded potential energy per H– H pair as a function of temperature shows the same tendency in Fig.  2. The non-bonded potential energy of each H– H pair decreases when T* increases from 0.1 to 0.2. In the medium temperature range 0.2 < T* < 0.6, the values increase rapidly to − 0.01 with the increasing temperature. When T* > 0.6, the non-bonded potential energy of each H– H pair reaches a plateau with almost constant value. Figure  5(a) also shows that the non-bonded potential energy per H– H pair of all-α proteins is lower than that of all-β proteins, especially under the low temperature, which illustrated that all-α proteins present higher structural stability than all-β proteins. The non-bonded potential energy of 24 α + β and 23 α /β protein chains with the number of H components listed in Table  1 are also evaluated at T* = 0.5. The statistical results of non-bonded potential energy per H– H pair of all-α and all-β proteins under T* = 0.5 are − 0.0312 and − 0.0262, respectively. The total non-bonded potential energy of each chain at T* = 0.5 is calculated as

The comparative results are shown in Fig.  5(b). The value of correlation coefficient, R = 0.757, is consequently obtained. Comparing Figs.  3(b), 3(c), and 4(b), the numerical gap of the non-bonded potential energies between simulated and predicted results in Fig.  5(b) is relatively large. The correlation coefficient R as a function of temperature is also shown in Fig.  5(b). A lower value of non-bonded potential energy corresponds to a higher value of R obtained. The reason for the comparatively low value of correlation coefficient R mainly lies in disregarding other residue– residue interactions, such as H– N, H– P, N– P, N– N, and P– P. Besides H– H pairwise interactions, interactions between other residue types also partly contribute to the non-bonded potential energy, especially with the large value of potential energy under high temperature. Therefore, the higher non-bonded potential energy leads to the relatively large error.

Fig.  5. Statistical and predicted values of non-bonded potential energy. (a) Non-bonded potential energy of each H– H pair versus reduced temperature T* based on simulation results from 50 all-α proteins (■ ) and 49 all-β proteins (• ), (b) simulated and predicted values of non-bonded potential energy of 47 other proteins at T* = 0.5 and the correlation coefficient R versus reduced temperature T* in the inserted figure.

4. Conclusions

The biological function of a protein is related to its unique structure determined by complex interactions among protein residues. In the present article, we studied the potential energy of 146 protein samples, including bond-stretching, bond-bending, bond-torsional, and non-bonded potential energy, by a coarse-grained HNP model. MD simulations are carried out under a broad temperature range. The predicted values of the four parts of potential energy are also compared with simulated data obtained directly by the MD method. These results presented in our work are as follows.

(I) The ratio of residues participating in forming helices or sheets as secondary-structural elements to total residues of a protein are strongly relevant to protein categories, i.e., all-α , all-β , α + β , and α /β classes. On the contrary, the frequency of the occurrence of P type residues is higher than that of H type according to the compositional analysis of 146 protein samples. It is completely unrelated to the protein categories classified by the SCOP database.

(II) From the perspective of the propensity that three types of residues H, N, and P to construct secondary structures, we have concluded that P residues easily form α -helices, whereas H ones present a higher tendency to construct β -sheets. This result indicates that α -helices are mostly located on the surface of the protein, while β -sheets aggregate inside of it. Such a kind of configuration makes a globular protein consisting of a hydrophobic core and a hydrophilic exterior. This spatial distribution of secondary structure by amino acid hydrophobicity is useful in discriminating native from non-native protein structures.

(III) The bond-stretching/bending potential energy increases linearly as a function of temperature. We attempted to explain this trend by the theorem of equipartition of energy according to the average value of bond-stretching/bending potential energy associated with each independent degree of freedom r or θ . The slope of the bond-stretching potential energy as the function of temperature is larger than that of the linear curve of bond-bending potential energy. The values of bond-stretching and bending energy are the same at T* = 0.3, regardless of protein samples.

(IV) The bond-torsional potential energy gradually increases to the maximum value and then decreases with increasing temperature. The transition temperature corresponding to the maximum value of bond-torsional energy is related to the secondary structure of the protein. The bond-torsional potential energy per bond of α -helices is lower than that of β -sheets under a wide temperature range, which indicates that the α -helix is more stable than the β -sheet in the secondary-structural level. For the α -helix, the transition temperature is T* = 0.3, which is higher than that of the β -sheet at T* = 0.2. Both numbers of residues constructing α -helices or β -sheets would be taken into account in evaluating the total bond-torsional energy of each protein.

(V) In contrast to the tendency of the bond-torsional potential energy as the function of temperature, the non-bonded potential energy decreases until T* = 0.2. Then the values increase rapidly to − 0.01 in the medium temperature range 0.2 < T* < 0.6 and slowly increase when T* > 0.6. The curve tendency of non-bonded potential energy per H– H pair is the same for all- α and all-β proteins. However, the average value of all-α samples is lower than that of all-β proteins under a certain temperature, which indicates the higher stability of the α -helical structure. The change in the total potential energy of each protein is attributed to the competition between bond- and non-bonded interactions.

The HNP model presented in this article simplifies a single protein chain to the sequence of three types of beads. In considering the transition of potential energy, the HNP model effectively yields the essentials and successfully predicts the bonded potentials. Semi-quantitative prediction of non-bonded potential energy is also achieved. These results provide us with a natural realization of the energy system of proteins.

Acknowledgment

The computations for this research were performed using the Shuguang HPC supercomputer of China Jiliang University.

Reference
1 Bratko D and Blanch H W 2001 J. Chem. Phys. 114 561 DOI:10.1063/1.1330212 [Cited within:2]
2 Anfinsen C B 1973 Science 181 223 DOI:10.1126/science.181.4096.223 [Cited within:1]
3 Harrison P M, Chan H S, Prusiner S B and Cohen F E 1999 J. Mol. Biol. 286 593 DOI:10.1006/jmbi.1998.2497 [Cited within:1]
4 Asherie N, Pand e J, Lomakin A, Ogun O, Hanson S R A, Smith J B and Benedek G B 1998 Biophys. Chem. 75 213 DOI:10.1016/S0301-4622(98)00208-7 [Cited within:1]
5 Jiang T Z, Cui Q H, Shi G H and Ma S D 2003 J. Chem. Phys. 119 4592 DOI:10.1063/1.1592796 [Cited within:2]
6 Ou-Yang Z C, Su Z B and Wang C L 1997 Phys. Rev. Lett. 78 4055 DOI:10.1103/PhysRevLett.78.4055 [Cited within:1]
7 Li T, Jiang Z T, Yan D D and Nies E 2010 Polymer 51 5612 DOI:10.1016/j.polymer.2010.09.019 [Cited within:1]
8 Chen J X, Mao J W, Thakur S, Xu J R and Liu F Y 2011 J. Chem. Phys. 135 094504 DOI:10.1063/1.3629850 [Cited within:1]
9 Chen J X, Zhu J X, Ma Y Q and Cao J S 2014 Europhys. Lett. 106 18003 DOI:10.1209/0295-5075/106/18003 [Cited within:1]
10 Sakae Y and Okamoto Y 2010 Mol. Simulat. 36 302 DOI:10.1080/08927020903373638 [Cited within:1]
11 Bond P J and Sansom M S P 2006 J. Am. Chem. Soc. 128 2697 DOI:10.1021/ja0569104 [Cited within:1]
12 Li C and Strachan A 2014 J. Polym. Sci. B 53 103 DOI:10.1002/polb.23489 [Cited within:1]
13 Chai A H, Ran S Y, Zhang D, Jiang Y W, Yang G C and Zhang L X 2013 Chin. Phys. B 22 098701 DOI:10.1088/1674-1056/22/9/098701 [Cited within:1]
14 Yang J S, Huang D H, Cao Q L, Li Q, Wang L Z and Wang F H 2013 Chin. Phys. B 22 098101 DOI:10.1088/1674-1056/22/9/098101 [Cited within:1]
15 Ferreon A C M and Deniz A A 2011 BBA-Proteins Proteom. 1814 1021 DOI:10.1016/j.bbapap.2011.01.011 [Cited within:1]
16 O’Connell T M, Wang L, Tropsha A and Hermans J 1999 Proteins 36 407 DOI:10.1002/(ISSN)1097-0134 [Cited within:1]
17 Vettorel T, Grosberg A Y and Kremer K 2009 Phys. Biol. 6 025013 DOI:10.1088/1478-3975/6/2/025013 [Cited within:1]
18 Faccioli P, Sega M, Pederiva F and Orland H 2006 Phys. Rev. Lett. 97 108101 DOI:10.1103/PhysRevLett.97.108101 [Cited within:1]
19 Jiang Z T, Zhang L X, Sun T T and Wu T Q 2009 Chin. Phys. B 18 4580 DOI:10.1088/1674-1056/18/10/080 [Cited within:3]
20 Jiang Z T, Zhang L X, Chen J, Xia A and Zhao D 2002 Polymer 43 6037 DOI:10.1016/S0032-3861(02)00501-3 [Cited within:3]
21 Dill K A 1985 Biochemistry 24 1501 DOI:10.1021/bi00327a032 [Cited within:1]
22 Yue K, Fiebig K M, Thomas P D, Chan H S, Shackhnovich E I and Dill K A 1995 Proc. Natl. Acad. Sci. USA 92 325 DOI:10.1073/pnas.92.1.325 [Cited within:1]
23 Wust T and Land au D P 2012 J. Chem. Phys. 137 064903 DOI:10.1063/1.4742969 [Cited within:1]
24 Sangha A K and Keyes T 2010 J. Phys. Chem. B 114 16908 DOI:10.1021/jp107257b [Cited within:5]
25 Regan L and DeGrado W F 1988 Science 241 976 DOI:10.1126/science.3043666 [Cited within:3]
26 Miyazawa S and Jernigan R L 1996 J. Mol. Biol. 256 623 DOI:10.1006/jmbi.1996.0114 [Cited within:1]
27 Gutin A M, Abkevich V I and Shakhnovich E I 1996 Phys. Rev. Lett. 77 5433 DOI:10.1103/PhysRevLett.77.5433 [Cited within:1]
28 Taneri S 2008 Int. J. Mod. Phys. C 19 749 DOI:10.1142/S0129183108012443 [Cited within:1]
29 Levitt M and Chothia C 1976 Nature 261 552 DOI:10.1038/261552a0 [Cited within:1]
30 Murzin A G, Brenner S E, Hubbard T and Chothia C 1995 J. Mol. Biol. 247 536 [Cited within:1]
31 Miyazawa S and Jernigan R L 1985 Macromolecules 18 534 DOI:10.1021/ma00145a039 [Cited within:1]
32 Friedel M, Sheeler D J and Shea J E 2003 J. Chem. Phys. 118 8106 DOI:10.1063/1.1564048 [Cited within:1]
33 Jiang Z T, Xu P and Sun T T 2012 Chinese J. Polym. Sci. 30 45 DOI:10.1007/s10118-012-1101-y [Cited within:1]
34 Nose S 1984 J. Chem. Phys. 81 511 DOI:10.1063/1.447334 [Cited within:1]
35 Hoover W G 1985 Phys. Rev. A 31 1695 DOI:10.1103/PhysRevA.31.1695 [Cited within:1]
36 Pace C N and Scholtz J M 1998 Biophys. J. 75 422 DOI:10.1016/S0006-3495(98)77529-0 [Cited within:2]
37 Best R B, de Sancho D and Mittal J 2012 Biophys. J. 102 1462 DOI:10.1016/j.bpj.2012.02.024 [Cited within:1]
38 Minor D L and Kim P S 1993 Nature 367 660 [Cited within:1]