Complexes clustering

In this page we present results obtained using the 79 experiments from Michael Eisen's clustering paper . This dataset is a combination of several independent time series experiments. We used the Mips complexes database and looked at the 70 top level complexes reported in that database. Each of these complexes is a collection of genes known to participate in some complex creation. We used the 979 genes that appear in the Eisen's dataset and are categorized by the MIPS complexes. As indicated in the Eisen paper, when genes are compared across a number of non identical conditions, noise that is present in single observation does not contribute significantly to the resulting similarity. Thus, we expect genes that participate in the same complex construction to have similar expression patterns in this large dataset. The following figures compare the results of hierarchical clustering with (right) and without (left) optimal leaf ordering. The smaller figures are enlargements of the same cluster in the two figures. The numbers to the right of the small figures represent the complex to which the gene belongs Click here for a table that translates the seven largest complexes numbers to the complex name in the MIPS database (the rest of the numbers where given according to the order they appear in the MIPS database, 10 to the first, 20 to the second etc.). As can be seen, using optimal ordering, genes that belong to the same complex (640) are grouped much tighter together. This can help the user determine not only the cluster but also which genes are at the 'center' of the cluster. This demonstrates that using optimal ordering one can arrive at clusters in which their 'center' (i.e. genes that appear in the center of the cluster) is a better representation of the cluster. When a user picks the clusters in hierarchical clustering, at least some of the genes in each of the clusters are not highly correlated with the cluster itself (since the number of clusters is limited and all genes are assigned to at least one cluster). Using optimal ordering, such genes are usually placed on the 'borders' of the clusters (since they are not highly correlated with genes in the center of the cluster). Thus, the notion of 'center' gets a new meaning. Genes that are placed in the center of the cluster when using the optimal ordering algorithm are genes which are highly correlated with other genes in the cluster and thus with the cluster itself. These are the genes the user should focus on.


Hierarchical clustering Optimal ordering

The two files needed to view the full results in TreeView (including the gene names and group assiginments) are: Click here for the .CDT file. Click here for the .GTR file.