Comparison with Previous Work

The problem of discovering co-regulated sets of genes and networks using large high-throughput data sources is an important one, and several papers addressing this issue have recently been published. In particular, two Nature Genetics papers by Pilpel et al. [1] and Ihmels et al. [2] discuss the discovery of transcriptional modules by using gene expression data to refine a set of genes that where selected based on some other criteria (DNA motifs or functional categories). This usually results in subsets of genes that are activated by the cell under the same conditions. While this is an important first step, one needs to employ a more sophisticated method, and use other data sources, in order to refine these large modules and discover the actual regulatory networks that are employed. In this paper we focus on identifying gene modules. These are subsets of genes that are both very similarly expressed and regulated by the same factors. This allows us to discover the actual regulatory networks that are activated under different conditions. There are a number of novel approaches we take in this paper that allows us to extend previous work on this problem [1,2]. First, our algorithm exhaustively (though efficiently) searches the entire combinatorial space of subsets of factors. This allows us to find much smaller modules than those identified in the past, and thus to identify the actual pathway used by the cell under the different conditions . Second, unlike previous methods which used one criterion to select the initial set of genes and then used expression to refine that set, our algorithm truly combines the two data sources (binding and expression data) by revisiting the binding data after refining the initial set. This results in modules that better reflect the input data, and avoids arbitrary decisions. Finally, our method allows us to focus not only on the genes themselves but also on the relationships between the factors binding to these genes and the bounded set of genes. This is useful for assigning functional annotations to the factors, and for determining the modes of combinatorial regulation.

Below we present four figures that demonstrate the relationships between the modules discovered by our method and some of the modules from the above papers. Three of these figures refer to the modules from the recent paper by Ihmels et al [2]. As can be seen in these figures, our modules refine the modules of the Ihmels paper, which results in an improved understanding of the networks used under the relevant conditions. The last figure compares one of the modules from the Pilpel et al paper [1] with some of the modules presented in our paper. Again, our modules refine the modules of the Pilpel paper, providing a distinction between genes that are regulated under differently, and participate in different cell cycle phases.

  • Ihmels: Amino acid starvation network
  • Ihmels: Hap-Abf module
  • Ihmels: Cell cycle module
  • Pilpel: Cell cycle module
  • [1] Yitzhak Pilpel, Priya Sudarsanam, George M. Church.
    Identifying regulatory networks by combinatorial analysis of promoter elements. Nature Genetics 29, 153 - 159 (2001).

    [2] Jan Ihmels, Gilgi Friedlander, Sven Bergmann, Ofer Sarig, Yaniv Ziv & Naama Barkai.
    Revealing modular organization in the yeast transcriptional network. Nature Genetics 31(4), 370-377 (2002).