Comparison with Previous Work
The problem of discovering co-regulated sets of genes and networks using large
high-throughput data sources is an important one, and several papers
addressing this issue have recently been published. In particular, two Nature
Genetics papers by Pilpel et al. [1] and Ihmels et al. [2]
discuss the discovery of transcriptional modules by using gene expression
data to refine a set of genes that where selected based on some other
criteria (DNA motifs or functional categories).
This usually results in subsets of genes that are activated by the cell under
the same conditions. While this is an important first step, one needs to
employ a more sophisticated method, and use other data sources, in order to
refine these large modules and discover the actual regulatory networks that
are employed.
In this paper we focus on identifying gene modules. These are subsets of
genes that are both very similarly expressed and regulated by the same factors.
This allows us to discover the actual regulatory networks that are activated
under different conditions. There are a number of novel approaches we take in
this paper that allows us to extend previous work on this problem [1,2].
First, our algorithm exhaustively (though efficiently) searches the entire
combinatorial space of subsets of factors. This allows us to find much
smaller modules than those identified in the past, and thus to identify the
actual pathway used by the cell under the different conditions . Second,
unlike previous methods which used one criterion to select the initial set of
genes and then used expression to refine that set,
our algorithm truly combines the two data sources
(binding and expression data) by revisiting the binding data after refining
the initial set. This results in modules that better reflect the input data,
and avoids arbitrary decisions. Finally, our method allows us to focus not
only on the genes themselves but also on the relationships between the factors
binding to these genes and the bounded set of genes.
This is useful for assigning functional annotations to the factors,
and for determining the modes of combinatorial regulation.
Below we present four figures that demonstrate the relationships between the
modules discovered by our method and some of the modules from the above
papers. Three of these figures refer to the modules from the recent paper
by Ihmels et al [2].
As can be seen in these figures, our modules refine the modules of the
Ihmels paper, which results in an improved understanding of the networks
used under the relevant conditions. The last figure compares one of the
modules from the Pilpel et al paper [1] with some of the modules presented in
our paper. Again, our modules refine the modules of the Pilpel paper,
providing a distinction between genes that are regulated under differently,
and participate in different cell cycle phases.
Ihmels: Amino acid starvation network
Ihmels: Hap-Abf module
Ihmels: Cell cycle module
Pilpel: Cell cycle module
[1] Yitzhak Pilpel, Priya Sudarsanam, George M. Church.
Identifying regulatory networks by combinatorial analysis of promoter elements.
Nature Genetics 29, 153 - 159 (2001).
[2] Jan Ihmels, Gilgi Friedlander, Sven Bergmann, Ofer Sarig, Yaniv Ziv & Naama Barkai.
Revealing modular organization in the yeast transcriptional network.
Nature Genetics 31(4), 370-377 (2002).