TY - JOUR

T1 - Generation of hierarchically correlated multivariate symbolic sequences: With an application to the assessment of bootstrap confidence in phylogenetic analysis.

AU - Tumminello, Michele

AU - Mantegna, Rosario Nunzio

AU - Lillo, Fabrizio

AU - Lillo, Fabrizio

AU - Lillo, null

PY - 2008

Y1 - 2008

N2 - We introduce a method to generate multivariate series of symbols from a finite alphabet with agiven hierarchical structure of similarities based on the Hamming distance. The target hierarchical structureof similarities is arbitrary, for instance the one obtained by some hierarchical clustering method applied toan empirical matrix of similarities. The method that we present here is based on a generating mechanismthat does not make use of mutation rate, which is widely used in phylogenetic analysis. Here we use theproposed simulation method to investigate the relationship between the bootstrap value associated witha node of a phylogeny and the probability of finding that node in the true phylogeny. The results of thisanalysis are compared with those obtained in the literature according to an evolutionary model with aper-symbol constant mutation rate. We observe that the relationship between the bootstrap value of anode and the probability of the corresponding clade being correct is sensitive to both the length of dataseries and the length of the branch connecting the node to its closest ancestor in the phylogenetic tree,whereas such a relationship is only slightly affected by the topology of the true phylogeny and by theabsolute value of similarity.

AB - We introduce a method to generate multivariate series of symbols from a finite alphabet with agiven hierarchical structure of similarities based on the Hamming distance. The target hierarchical structureof similarities is arbitrary, for instance the one obtained by some hierarchical clustering method applied toan empirical matrix of similarities. The method that we present here is based on a generating mechanismthat does not make use of mutation rate, which is widely used in phylogenetic analysis. Here we use theproposed simulation method to investigate the relationship between the bootstrap value associated witha node of a phylogeny and the probability of finding that node in the true phylogeny. The results of thisanalysis are compared with those obtained in the literature according to an evolutionary model with aper-symbol constant mutation rate. We observe that the relationship between the bootstrap value of anode and the probability of the corresponding clade being correct is sensitive to both the length of dataseries and the length of the branch connecting the node to its closest ancestor in the phylogenetic tree,whereas such a relationship is only slightly affected by the topology of the true phylogeny and by theabsolute value of similarity.

KW - Combinatorics; graph theory

KW - Complex systems

KW - Multivariate analysis

KW - Combinatorics; graph theory

KW - Complex systems

KW - Multivariate analysis

UR - http://hdl.handle.net/10447/45660

M3 - Article

VL - 2008

SP - 333

EP - 340

JO - European Physical Journal B

JF - European Physical Journal B

SN - 1434-6028

ER -