Increase in Similarity for Randomly Generated Data

For each leaf we chose 60 values representing 60 time points (or different experiments). These values where either -1 (representing a decrease in the gene expression), 0 (no change) or 1 (increase). For a given set of randomly generated data, we computed their similarity matrix, and then hierarchically clustered these data points. Denote by Tr the resulting tree. Denote by S(Tr) the sum of the similarity between adjacent leaves in the (current) linear ordering of Tr. Denote by D(Tr) the sum of the similarity between adjacent leaves after performing our optimal ordering algorithm. We denote the increase in similarity of D(Tr) by I = (D(Tr) - S(Tr))/S(Tr). The next figure shows how I changes as a function of the number of leaves (n). As can be seen, even for large number of leaves (1500), I is on average quit big, indicating that optimal ordering has a big impact on the similarity of neighboring leaves in the linear ordering.