中国物理B ›› 2007, Vol. 16 ›› Issue (2): 392-404.doi: 10.1088/1009-1963/16/2/019

• ATOMIC AND MOLECULAR PHYSICS • 上一篇    下一篇

Protein structural codes and nucleation sites for protein folding

江凡, 李南   

  1. Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing 100080, China
  • 收稿日期:2006-06-05 修回日期:2006-08-31 出版日期:2007-02-20 发布日期:2007-02-20

Protein structural codes and nucleation sites for protein folding

Jiang Fan(江凡) and Li Nan(李南)   

  1. Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing 100080, China
  • Received:2006-06-05 Revised:2006-08-31 Online:2007-02-20 Published:2007-02-20

摘要: One of the long-standing controversial arguments in protein folding is Levinthal's paradox. We have recently proposed a new nucleation hypothesis and shown that the nucleation residues are the most conserved sequences in protein. To avoid the complicated effect of tertiary interactions, we limit our search for structural codes to the nucleation residues. Starting with the hypotheses of secondary structure nucleation and conservation of residues important for folding, we have analysed 762 folds classified as unique by SCOP. Segments of 17 residues around the top 20% conserved amino acids are analysed, resulting in approximately 100 clusters each for the main secondary structure classes of helix, sheet and coil. Helical clusters have the longest correlation range, coils the shortest (four residues). Strong specific sequence-structure correlation is observed for coil but not for helix and sheet, suggesting a mapping relationship between the sequence and the structure for coil. We propose that the central sequences in these clusters form `structural codes', a useful basis set for identifying nucleation sites, protein fragments stable in isolation, and secondary structural patterns in proteins (particularly turns and loops).

Abstract: One of the long-standing controversial arguments in protein folding is Levinthal's paradox. We have recently proposed a new nucleation hypothesis and shown that the nucleation residues are the most conserved sequences in protein. To avoid the complicated effect of tertiary interactions, we limit our search for structural codes to the nucleation residues. Starting with the hypotheses of secondary structure nucleation and conservation of residues important for folding, we have analysed 762 folds classified as unique by SCOP. Segments of 17 residues around the top 20% conserved amino acids are analysed, resulting in approximately 100 clusters each for the main secondary structure classes of helix, sheet and coil. Helical clusters have the longest correlation range, coils the shortest (four residues). Strong specific sequence-structure correlation is observed for coil but not for helix and sheet, suggesting a mapping relationship between the sequence and the structure for coil. We propose that the central sequences in these clusters form `structural codes', a useful basis set for identifying nucleation sites, protein fragments stable in isolation, and secondary structural patterns in proteins (particularly turns and loops).

Key words: nucleation, hydrogen bond, secondary structure, structural code, protein folding, sequence-structure relationship

中图分类号:  (Proteins)

  • 87.14.E-
87.15.B- (Structure of biomolecules) 87.15.Cc (Folding: thermodynamics, statistical mechanics, models, and pathways) 87.15.K- (Molecular interactions; membrane-protein interactions) 87.15.N- (Properties of solutions of macromolecules)