Abstract Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.
Received: 24 April 2008
Revised: 27 August 2008
Accepted manuscript online:
PACS:
87.15.Cc
(Folding: thermodynamics, statistical mechanics, models, and pathways)
Fund: Project supported by the National
Natural Science Foundation of China (Grant No 60575038) and the
Natural Science Foundation of Jiangnan University, China (Grant No
20070365).
Cite this article:
Gao Jie(高洁) and Xu Zhen-Yuan(徐振源) Chaos game representation (CGR)-walk model for DNA sequences 2009 Chin. Phys. B 18 370
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.