<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1756-0381-2-9</ui>
   <ji>1756-0381</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>A biclustering algorithm based on a Bicluster Enumeration Tree: application to DNA microarray data</p>
         </title>
         <aug>
            <au ca="yes" id="A1">
               <snm>Ayadi</snm>
               <fnm>Wassim</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>wassim.ayadi@gmail.com</email>
            </au>
            <au id="A2">
               <snm>Elloumi</snm>
               <fnm>Mourad</fnm>
               <insr iid="I1"/>
               <email>mourad12345678@yahoo.com</email>
            </au>
            <au id="A3">
               <snm>Hao</snm>
               <fnm>Jin-Kao</fnm>
               <insr iid="I2"/>
               <email>hao@info.univ-angers.fr</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>UTIC, Higher School of Sciences and Technologies of Tunis, 1008 Tunis, Tunisia</p>
            </ins>
            <ins id="I2">
               <p>LERIA, Universit&#233; d'Angers, 2 Boulevard Lavoisier, 49045 Angers, France</p>
            </ins>
         </insg>
         <source>BioData Mining</source>
         <issn>1756-0381</issn>
         <pubdate>2009</pubdate>
         <volume>2</volume>
         <issue>1</issue>
         <fpage>9</fpage>
         <url>http://www.biodatamining.org/content/2/1/9</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/1756-0381-2-9</pubid>
               <pubid idtype="pmpid">20015398</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>20</day>
               <month>7</month>
               <year>2009</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>16</day>
               <month>12</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>16</day>
               <month>12</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Ayadi et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of rows coherent with groups of columns. This kind of clustering is called <it>biclustering</it>. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed.</p>
            </sec>
            <sec>
               <st>
                  <p>Methods</p>
               </st>
               <p>We introduce <it>BiMine</it>, a new enumeration algorithm for biclustering of DNA microarray data. The proposed algorithm is based on three original features. First, <it>BiMine </it>relies on a new evaluation function called <it>Average Spearman's rho </it>(ASR). Second, <it>BiMine </it>uses a new tree structure, called <it>Bicluster Enumeration Tree </it>(BET), to represent the different biclusters discovered during the enumeration process. Third, to avoid the combinatorial explosion of the search tree, <it>BiMine </it>introduces a parametric rule that allows the enumeration process to cut tree branches that cannot lead to good biclusters.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data. The experimental results show that <it>BiMine </it>competes well with several other biclustering methods. Moreover, we test the biological significance using a gene annotation web-tool to show that our proposed method is able to produce biologically relevant biclusters. The software is available upon request from the authors to academic users.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>DNA microarray technology is a revolutionary method enabling the measurement of expression levels of at least thousands of genes in a single experiment under diverse experimental conditions. This technology has found numerous applications in research and applied areas like biology, drug discovery, toxicological study and diseases diagnosis.</p>
         <p>DNA microarray data is typically represented by a matrix where each cell represents the gene expression level of a gene under a particular experimental condition. One important analysis task of microarray data concerns the simultaneous identification of groups of genes that show similar expression patterns across specific groups of experimental conditions (samples) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Such an application can be addressed by a biclustering process whose aim is to discover coherent biclusters. That is, a bicluster is a subset of genes and conditions of the original expression matrix where the selected genes present a coherent behavior under all the experimental conditions contained in the bicluster.</p>
         <p>More generally, biclustering has also applications in other domains such as text mining <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>, target marketing <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>, markets search <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, search in databases <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> and analyzing foreign exchange data <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <p>Formally, let <it>I </it>= {1, 2, ..., <it>n</it>} denote the index set of <it>n </it>genes and <it>J </it>= {1, 2, ..., <it>m</it>} the index set of <it>m </it>conditions, a <it>data matrix M</it>(<it>I</it>, <it>J</it>) associated with <it>I </it>and <it>J </it>is a <it>n</it>*<it>m </it>matrix where the <it>i</it><sup>th </sup>row, <it>i </it>&#8712; <it>I</it>, represents the <it>i</it><sup>th </sup>gene or attribute and the <it>j</it><sup>th</sup>, <it>j </it>&#8712; <it>J</it>, column represents the <it>j</it><sup>th </sup>condition or individual and <it>m</it><sub><it>ij </it></sub>of the <it>i</it><sup>th </sup>row and the <it>j</it><sup>th </sup>column represents the value of the <it>j</it><sup>th </sup>condition for the <it>i</it><sup>th </sup>gene. A <it>bicluster </it>in a data matrix <it>M</it>(<it>I</it>, <it>J</it>) is a couple (<it>I</it>', <it>J</it>') such that <it>I</it>'&#8838; <it>I </it>and <it>J</it>'&#8838; <it>J</it>. The biclustering problem can be formulated as follows: Given a data matrix <it>M</it>, construct a bicluster <it>B</it><sub><it>opt </it></sub>associated with <it>M </it>such that:</p>
         <p>
            <display-formula id="M1">
               <graphic file="1756-0381-2-9-i1.gif"/>
            </display-formula>
         </p>
         <p>where <it>f </it>is an <it>objective function </it>measuring the <it>quality</it>, i.e., degree of coherence, of a group of biclusters and <it>BC</it>(<it>M</it>) is the set of all the possible groups of biclusters associated with <it>M</it>.</p>
         <p>Clearly, biclustering is a highly combinatorial problem with a search space of order of <it>O</it>(<it>2</it><sup>|<it>I</it>|+|<it>J</it>|</sup>). In the general case, biclustering is known to be NP-hard <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Consequently, most of the algorithms used to discover biclusters are based on heuristics to explore partially the combinatorial search space. The existing algorithms for biclustering can roughly be classified into two large families: systematic search methods and stochastic search methods (also called metaheuristic methods). Representative examples of systematic search methods include, among others, greedy algorithms <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>, divide and conquer algorithms <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B15">15</abbr></abbrgrp> and enumeration algorithms <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. On the other hand, among the metaheuristic methods, we can mention neighbourhood-based algorithms like simulated annealing <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, GRASP <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, evolutionary and hybrid algorithms <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. A recent review of various biclustering algorithms for biological data analysis is provided in <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
         <p>Since the biclustering problem is a NP-hard problem and no single existing algorithm is completely satisfactory for solving the problem. It is useful to seek more effective algorithms for better solutions. In this paper, we introduce a new enumeration algorithm for biclustering of DNA microarray data, called <it>BiMine</it>. Our algorithm is based on three original features. First, <it>BiMine </it>relies on a new evaluation function called <it>Average Spearman's rho </it>(ASR) which is used to guide effectively the exploration of the search space. Second, <it>BiMine </it>uses a new tree structure, called <it>Bicluster Enumeration Tree </it>(BET), to represent conveniently the different biclusters discovered during the enumeration process. Third, to avoid the combinatorial explosion of the search tree, <it>BiMine </it>introduces a parametric rule that allows the enumeration process to cut tree branches that cannot lead to good biclusters.</p>
         <p>To assess the performance of the proposed <it>BiMine </it>algorithm, we show computational results obtained on both synthetic and real datasets and compare our results with those from four state-of-the-art biclustering algorithms. Moreover, to evaluate the biological relevance of our resulting biclusters, we carry out a practical validation with respect to a specific Gene Ontology (GO) annotation with the help of a popular web tool.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>A New Evaluation Function of Biclustering</p>
            </st>
            <p>Like any search algorithm, <it>BiMine </it>needs an evaluation function to assess the quality of a candidate bicluster. One possibility is to use the so-called <it>Mean Squared Residue </it>(MSR) function <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Indeed, since its introduction, MSR has largely been used by biclustering algorithms, see for instance <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B13">13</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. However, MSR is known to be deficient to assess correctly the quality of certain types of biclusters <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>. In a recent work, Teng and Chan <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> proposed another function for bicluster evaluation called <it>Average Correlation Value </it>(ACV). However, the performance of ACV is known to be sensitive to errors <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
            <p>In this paper, we propose a new evaluation function called <it>Average Spearman's rho </it>(ASR) based on <it>Spearman's rank correlation</it>. Let <inline-formula><graphic file="1756-0381-2-9-i2.gif"/></inline-formula> and <inline-formula><graphic file="1756-0381-2-9-i3.gif"/></inline-formula> be two vectors, the <it>Spearman's rank correlation </it><abbrgrp><abbr bid="B30">30</abbr></abbrgrp> expresses the dependency between the vectors <it>X</it><sub><it>i </it></sub>and <it>X</it><sub><it>j </it></sub>(denoted by <it>&#961;</it><sub><it>ij</it></sub>) and is defined as follows:</p>
            <p>
               <display-formula id="M2">
                  <graphic file="1756-0381-2-9-i4.gif"/>
               </display-formula>
            </p>
            <p>where <inline-formula><graphic file="1756-0381-2-9-i5.gif"/></inline-formula> (resp. <inline-formula><graphic file="1756-0381-2-9-i6.gif"/></inline-formula>) is the rank of <inline-formula><graphic file="1756-0381-2-9-i7.gif"/></inline-formula> (resp. <inline-formula><graphic file="1756-0381-2-9-i8.gif"/></inline-formula>).</p>
            <p>Let (<it>I'</it>, <it>J'</it>) be a bicluster in data matrix <it>M</it>(<it>I</it>, <it>J</it>), the ASR evaluation function is then defined by:</p>
            <p>
               <display-formula id="M3">
                  <graphic file="1756-0381-2-9-i9.gif"/>
               </display-formula>
            </p>
            <p>where:</p>
            <p><it>&#961;</it><sub><it>i</it>, <it>j </it></sub>(<it>i </it>&#8800; <it>j</it>) is the Spearman's rank correlation associated with the row indices <it>i </it>and <it>j </it>in the bicluster (<it>I'</it>, <it>J'</it>). <it>&#961;</it><sub><it>k</it>, <it>l </it></sub>(<it>k </it>&#8800; <it>l</it>) is the Spearman's rank correlation associated with the column indices <it>k </it>and <it>l </it>in the bicluster (<it>I'</it>, <it>J'</it>).</p>
            <p><b>Proposition 1: </b>Let (<it>I</it>', <it>J</it>') be a bicluster in a data matrix <it>M</it>(<it>I</it>, <it>J</it>). We have:</p>
            <p>
               <display-formula>
                  <graphic file="1756-0381-2-9-i10.gif"/>
               </display-formula>
            </p>
            <p><b>Proof: </b>Let us first show that:</p>
            <p>
               <display-formula>
                  <graphic file="1756-0381-2-9-i11.gif"/>
               </display-formula>
            </p>
            <p>Indeed, we have <inline-formula><graphic file="1756-0381-2-9-i12.gif"/></inline-formula> Spearman's rank correlations to calculate. According to <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, a Spearman's rank correlation belongs to [-1..1], we have then:</p>
            <p>
               <display-formula>
                  <graphic file="1756-0381-2-9-i13.gif"/>
               </display-formula>
            </p>
            <p>i.e.</p>
            <p>
               <display-formula>
                  <graphic file="1756-0381-2-9-i14.gif"/>
               </display-formula>
            </p>
            <p>It is easy to show in the same way that:</p>
            <p>
               <display-formula>
                  <graphic file="1756-0381-2-9-i15.gif"/>
               </display-formula>
            </p>
            <p>Hence:</p>
            <p>
               <display-formula>
                  <graphic file="1756-0381-2-9-i16.gif"/>
               </display-formula>
            </p>
            <p>i.e.:</p>
            <p>
               <display-formula>
                  <graphic file="1756-0381-2-9-i10.gif"/>
               </display-formula>
            </p>
            <p>With Spearman's rank correlation, a high (resp. low) value, <it>close </it>to 1 (resp. <it>close </it>to -1), indicates that the data is strongly (resp. weakly) correlated between two vectors <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. As shown above, ASR also takes values from [-1..1]. A high (resp. low) ASR value, <it>close </it>to 1 (resp. <it>close </it>to -1), indicates that the genes/conditions of the bicluster are strongly (resp. weakly) correlated.</p>
            <p>Furthermore, in the next subsection, we want to assess the quality of the proposed ASR evaluation function in comparison with two popular functions MSR and ACV.</p>
         </sec>
         <sec>
            <st>
               <p>Studies of the ASR Evaluation Function</p>
            </st>
            <p>We compare the ASR evaluation function with <it>Mean Squared Residue </it>(MSR) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. As mentioned previously, MSR is probably the most popular evaluation function and largely used in the literature. As a second reference function, we use <it>Average Correlation Value </it>(ACV) which was proposed very recently in <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
            <p>For the comparison, we apply the evaluation functions (without using any algorithms), i.e., ASR, MSR and ACV, on seven matrices (biclusters) denoted by <it>M</it><sub><it>1 </it></sub>to <it>M</it><sub><it>7 </it></sub>(Figure <figr fid="F1">1</figr>). These matrices are employed in <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B25">25</abbr></abbrgrp> and represent all typical biclusters. They are defined as follows. <it>M</it><sub><it>1 </it></sub>is a constant bicluster, <it>M</it><sub><it>2 </it></sub>has constant rows, <it>M</it><sub><it>3 </it></sub>has constant columns, <it>M</it><sub><it>4 </it></sub>is composed of coherent values (additive model), <it>M</it><sub><it>5 </it></sub>represents coherent values (multiplicative model), <it>M</it><sub><it>6 </it></sub>contains coherent values (multiplicative model, where the first row of <it>M</it><sub><it>5 </it></sub>is multiplied by 10) and <it>M</it><sub><it>7 </it></sub>represents a coherent evolution.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Different typical Biclusters</p>
               </caption>
               <text>
                  <p><b>Different typical Biclusters</b>. Data matrix <it>M</it><sub>1 </sub>represents a constant bicluster, <it>M</it><sub>2 </sub>represents a constant rows bicluster, <it>M</it><sub>3 </sub>represents a constant column bicluster, <it>M</it><sub>4 </sub>represents coherent values (additive model), <it>M</it><sub>5 </sub>represents coherent values (multiplicative model), <it>M</it><sub>6 </sub>represents coherent values (multiplicative model, where the first row of <it>M</it><sub>5 </sub>is multiplied by 10) and <it>M</it><sub>7 </sub>represents a coherent evolution.</p>
               </text>
               <graphic file="1756-0381-2-9-1"/>
            </fig>
            <p>The values of ASR versus MSR and ACV are illustrated by Table <tblr tid="T1">1</tblr> where the values of MSR and ACV were taken from <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>ASR versus MSR and ACV.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c ca="right">
                        <p>
                           <b>Biclusters</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>M</it>
                              <sub>
                                 <b>1</b>
                              </sub>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>M</it>
                              <sub>
                                 <b>2</b>
                              </sub>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>M</it>
                              <sub>
                                 <b>3</b>
                              </sub>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>M</it>
                              <sub>
                                 <b>4</b>
                              </sub>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>M</it>
                              <sub>
                                 <b>5</b>
                              </sub>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>M</it>
                              <sub>
                                 <b>6</b>
                              </sub>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>M</it>
                              <sub>
                                 <b>7</b>
                              </sub>
                           </b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>Evaluation Functions</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MSR</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0.62</p>
                     </c>
                     <c ca="center">
                        <p>2.425</p>
                     </c>
                     <c ca="center">
                        <p>131.87</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>ACV</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.84</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>ASR</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.99</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Concerning MSR, a low (resp. high) value, <it>close </it>to 0 (resp. higher than a fixed threshold), indicates that the genes/conditions of the bicluster are strongly (resp. weakly) correlated.</p>
            <p>Concerning ACV, a high (resp. low) value, <it>close </it>to 1 (resp. <it>close </it>to 0), indicates that the genes/conditions of the bicluster are strongly (resp. weakly) correlated.</p>
            <p>According to Table <tblr tid="T1">1</tblr>, the ASR, ACV and MSR functions are perfect to assess the quality of biclusters <it>M</it><sub>1</sub>, <it>M</it><sub>2</sub>, <it>M</it><sub>3 </sub>and <it>M</it><sub>4</sub>. However, MSR is deficient on <it>M</it><sub>6 </sub>and <it>M</it><sub>7</sub>, confirming the claim that MSR may have trouble on certain types of biclusters <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>. On the other hand, ASR and ACV are perfect to assess the quality of biclusters <it>M</it><sub>5</sub>and <it>M</it><sub>6 </sub>but ASR is slightly better than ACV when applied on <it>M</it><sub>7</sub>.</p>
         </sec>
         <sec>
            <st>
               <p>BiMine Algorithm</p>
            </st>
            <p>We present now our biclustering algorithm called <it>BiMine </it>which uses ASR as its evaluation function and a new structure, called <it>Bicluster Enumeration Tree </it>(BET) to represent the different biclusters associated with a data matrix. We describe first the main procedure for building biclusters and then show an illustrative example to ease the understanding of the algorithm.</p>
            <p>Let <it>M </it>be a data matrix, by using our algorithm, we operate in three steps: During the first step, we preprocess the data matrix <it>M</it>. During the second step, we construct a BET associated with <it>M</it>. Finally, during the last step, we identify the <it>best </it>biclusters.</p>
            <sec>
               <st>
                  <p>Preprocessing</p>
               </st>
               <p>In the clustering area, preprocessing is often used to eliminate <it>insignificant </it>attributes (genes). For the biclustering, the preprocessing step aims to remove irrelevant expression values of the data matrix <it>M </it>that do not contribute in obtaining pertinent results. A value <it>m</it><sub><it>ij </it></sub>of <it>M </it>is considered to be <it>insignificant </it>if we have:</p>
               <p>
                  <display-formula id="M4">
                     <graphic file="1756-0381-2-9-i17.gif"/>
                  </display-formula>
               </p>
               <p>where <it>avg</it><sub><it>i </it></sub>is the average over the non-missing values in the <it>i</it><sup>th </sup>row, <it>m</it><sub><it>ij </it></sub>represents the intersection of row <it>i </it>with column <it>j </it>and <it>&#948; </it>is a fixed threshold. Equation 4 is applied for each value of <it>M</it>. See Tables <tblr tid="T2">2</tblr> and <tblr tid="T3">3</tblr> for an example.</p>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>Data matrix <it>M'</it>.</p>
                  </caption>
                  <tblbdy cols="7">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>
                              <b>C</b>
                              <sub>
                                 <b>1</b>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>C</b>
                              <sub>
                                 <b>2</b>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>C</b>
                              <sub>
                                 <b>3</b>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>C</b>
                              <sub>
                                 <b>4</b>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>C</b>
                              <sub>
                                 <b>5</b>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>C</b>
                              <sub>
                                 <b>6</b>
                              </sub>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>I<sub>1</sub></p>
                        </c>
                        <c ca="center">
                           <p>10</p>
                        </c>
                        <c ca="center">
                           <p>20</p>
                        </c>
                        <c ca="center">
                           <p>5</p>
                        </c>
                        <c ca="center">
                           <p>15</p>
                        </c>
                        <c ca="center">
                           <p>40</p>
                        </c>
                        <c ca="center">
                           <p>18</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>I<sub>2</sub></p>
                        </c>
                        <c ca="center">
                           <p>20</p>
                        </c>
                        <c ca="center">
                           <p>40</p>
                        </c>
                        <c ca="center">
                           <p>10</p>
                        </c>
                        <c ca="center">
                           <p>30</p>
                        </c>
                        <c ca="center">
                           <p>24</p>
                        </c>
                        <c ca="center">
                           <p>20</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>I<sub>3</sub></p>
                        </c>
                        <c ca="center">
                           <p>23</p>
                        </c>
                        <c ca="center">
                           <p>12</p>
                        </c>
                        <c ca="center">
                           <p>8</p>
                        </c>
                        <c ca="center">
                           <p>15</p>
                        </c>
                        <c ca="center">
                           <p>29</p>
                        </c>
                        <c ca="center">
                           <p>50</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>I<sub>4</sub></p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                        <c ca="center">
                           <p>8</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>6</p>
                        </c>
                        <c ca="center">
                           <p>5</p>
                        </c>
                        <c ca="center">
                           <p>5</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>I<sub>5</sub></p>
                        </c>
                        <c ca="center">
                           <p>15</p>
                        </c>
                        <c ca="center">
                           <p>25</p>
                        </c>
                        <c ca="center">
                           <p>8</p>
                        </c>
                        <c ca="center">
                           <p>12</p>
                        </c>
                        <c ca="center">
                           <p>29</p>
                        </c>
                        <c ca="center">
                           <p>50</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <tbl id="T3">
                  <title>
                     <p>Table 3</p>
                  </title>
                  <caption>
                     <p>Data matrix <it>M </it>after preprocess.</p>
                  </caption>
                  <tblbdy cols="7">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>
                              <b>C</b>
                              <sub>
                                 <b>1</b>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>C</b>
                              <sub>
                                 <b>2</b>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>C</b>
                              <sub>
                                 <b>3</b>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>C</b>
                              <sub>
                                 <b>4</b>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>C</b>
                              <sub>
                                 <b>5</b>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>C</b>
                              <sub>
                                 <b>6</b>
                              </sub>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>I<sub>1</sub></p>
                        </c>
                        <c ca="center">
                           <p>10</p>
                        </c>
                        <c ca="center">
                           <p>20</p>
                        </c>
                        <c ca="center">
                           <p>5</p>
                        </c>
                        <c ca="center">
                           <p>15</p>
                        </c>
                        <c ca="center">
                           <p>40</p>
                        </c>
                        <c ca="center">
                           <p>-</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>I<sub>2</sub></p>
                        </c>
                        <c ca="center">
                           <p>20</p>
                        </c>
                        <c ca="center">
                           <p>40</p>
                        </c>
                        <c ca="center">
                           <p>10</p>
                        </c>
                        <c ca="center">
                           <p>30</p>
                        </c>
                        <c ca="center">
                           <p>-</p>
                        </c>
                        <c ca="center">
                           <p>20</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>I<sub>3</sub></p>
                        </c>
                        <c ca="center">
                           <p>-</p>
                        </c>
                        <c ca="center">
                           <p>12</p>
                        </c>
                        <c ca="center">
                           <p>8</p>
                        </c>
                        <c ca="center">
                           <p>15</p>
                        </c>
                        <c ca="center">
                           <p>29</p>
                        </c>
                        <c ca="center">
                           <p>50</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>I<sub>4</sub></p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                        <c ca="center">
                           <p>8</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>6</p>
                        </c>
                        <c ca="center">
                           <p>-</p>
                        </c>
                        <c ca="center">
                           <p>-</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>I<sub>5</sub></p>
                        </c>
                        <c ca="center">
                           <p>15</p>
                        </c>
                        <c ca="center">
                           <p>-</p>
                        </c>
                        <c ca="center">
                           <p>8</p>
                        </c>
                        <c ca="center">
                           <p>12</p>
                        </c>
                        <c ca="center">
                           <p>29</p>
                        </c>
                        <c ca="center">
                           <p>50</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>By considering only non-missing values, we minimize the loss of information in the data matrix. This way of preprocessing missing values should be contrasted with other techniques. For instance, in <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, where the whole row is removed if the row contains at least one missing value or in <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>, where the whole column is removed if it contains at least 5% of missing values. Furthermore, <it>BiMine </it>operates directly on the raw data matrix without resorting to a discretization of data, reducing thus the risk of loss of information.</p>
            </sec>
            <sec>
               <st>
                  <p>Building Bicluster Enumeration Tree</p>
               </st>
               <p>After the preprocessing step, we construct a <it>Bicluster Enumeration Tree </it>(BET) that represents every possible bicluster that can be made from <it>M</it>. Compared to other data structure, BET permits to represent the maximum number of significant biclusters and the links that exist between these biclusters. Since the number of possible biclusters (nodes of BET) increases exponentially, <it>BiMine </it>employs parametric rules to help the enumeration process to close (or cut) a tree node. Intuitively, a node is cut down if the quality of the bicluster represented by this node is below a fixed threshold.</p>
               <p>To describe formally our <it>BiMine </it>algorithm, let us define in the following the needed notations:</p>
               <p><it>n</it><sub><it>i</it></sub>: <it>i</it>th node order containing biclusters.</p>
               <p><it>n</it><sub><it>i</it></sub>.<it>g</it><sub><it>i</it></sub>: genes of <it>n</it><sub><it>i</it></sub>.</p>
               <p><it>n</it><sub><it>i</it></sub>.<it>Cg</it><sub><it>i</it></sub>: conditions of <it>n</it><sub><it>i</it></sub>.</p>
               <p><it>bic</it>: bicluster.</p>
               <p><it>&#948;</it>: threshold used in Equation 4.</p>
               <p><b>Threshold</b>: quality threshold according to ASR.</p>
               <p>The <it>BiMine </it>algorithm (Figure <figr fid="F2">2</figr> (Algorithm 1)) uses a first function to built an initial tree (<it>Init_BET</it>) which is recursively extended by a second function (<it>BET-tree</it>). <it>Init_BET </it>(Figure <figr fid="F2">2</figr> (Function 1)) generates thus the different biclusters from data matrix <it>M </it>with one gene and significant conditions after using Equation 4. The root of BET is the empty bicluster (Line 1). The nodes at level one are the possible biclusters with one gene (Line 2-4). Notice that each node <it>n</it><sub><it>i </it></sub>is composed of two part <it>n</it><sub><it>i</it></sub>.<it>g</it><sub><it>i </it></sub>(genes) and <it>n</it><sub><it>i</it></sub>.<it>Cg</it><sub><it>i </it></sub>(significant conditions after the filter preprocessing with Equation 4). From these initial biclusters, new and larger biclusters are recursively built while pruning as soon as possible any bicluster if its ASR value doesn't reach a fixed Threshold. This is the role of the next function <it>BET-tree</it>.</p>
               <fig id="F2">
                  <title>
                     <p>Figure 2</p>
                  </title>
                  <caption>
                     <p><it>BiMine </it>algorithm</p>
                  </caption>
                  <text>
                     <p><b><it>BiMine </it>algorithm</b>.</p>
                  </text>
                  <graphic file="1756-0381-2-9-2"/>
               </fig>
               <p><it>BET-tree </it>(Figure <figr fid="F2">2</figr> (Function 2)) creates recursively the BET (Line 13) and generates the set of the best biclusters. The <it>i</it><sup>th </sup>child of a node is made up, on the one hand, of the <it>union </it>of the genes of the father node and the genes of the <it>i</it><sup>th </sup>uncle node, starting from the right side of the father. On the other hand, it is made up of the <it>intersection </it>of the conditions of the father and those of the <it>i</it><sup>th </sup>uncle starting from the right side of the father (Line 4-12). If the ASR value associated with the <it>i</it><sup>th </sup>child is smaller than or equal to the given <it>Threshold</it>, then this child will be ignored (Line 6-11).</p>
               <p>Notice that this parametric pruning rule based on a quality threshold is fully justified in this context. Indeed, if the current bicluster is not good enough, then it is useless to keep it because expanding such a bicluster leads certainly to biclusters of worse quality. From this point of view, the pruning rule shares similar principles largely applied in optimization methods like Dynamic Programming. In addition, this pruning rule is essential in reducing the tree size and remains indispensable for handling large datasets.</p>
               <p>Finally, the union of the leaves of the constructed BET that are not included in other leaves and have at least two genes represents a <it>good </it>group of biclusters (Line 8-9).</p>
               <p><b>Proposition 2</b>: Time complexity of <it>BiMine </it>is <it>O</it>(2<sup><it>n</it></sup><it>mlog</it>(<it>m</it>)), where <it>n </it>is the number of rows and <it>m </it>is the number of columns of the data matrix.</p>
               <p><b>Proof: </b>Time complexity of the first step of <it>BiMine </it>is <it>O</it>(<it>nm</it>). Indeed, this step is achieved <it>via </it>a scanning of the whole data matrix <it>M </it>that is of size <it>nm</it>.</p>
               <p>Time complexity of the second step of <it>BiMine </it>is <it>O</it>(2<sup><it>n</it></sup><it>mlog</it>(<it>m</it>)). Actually, in the worst case, we have 2<sup><it>n </it></sup>nodes in the BET, representing the possible clusters of genes, each of which is associated with <it>m </it>conditions. On the other hand, since the conditions of the node are sorted, the construction of the intersection of two subsets of conditions of size <it>m </it>boils down to the search of <it>m </it>elements in a sorted array of size <it>m</it>. This can be done <it>via </it>a dichotomic search with a time complexity <it>O</it>(<it>mlog</it>(<it>m</it>)). Hence, the time complexity of the second step of <it>BiMine </it>is <it>O</it>(2<sup><it>n</it></sup><it>mlog</it>(<it>m</it>)). Thus, The time complexity of <it>BiMine </it>is <it>O</it>(2<sup><it>n</it></sup><it>mlog</it>(<it>m</it>)).</p>
            </sec>
            <sec>
               <st>
                  <p>Illustrative Example</p>
               </st>
               <p>Let <it>M' </it>a data matrix (Table <tblr tid="T2">2</tblr>). During the first step, we make a preprocessing of <it>M' </it>to obtain the data matrix <it>M </it>(Table <tblr tid="T3">3</tblr>). The character "-" represents a removed <it>insignificant </it>value. During the second step, we construct a BET that represents every possible bicluster that can be made from <it>M</it>. Let us set <it>&#948; </it>= 0.1 and threshold of ASR = 1. The first level of the BET is made up of the nodes that represent the possible biclusters with one gene. Each node represents a row of data matrix <it>M </it>(Figure <figr fid="F3">3</figr>).</p>
               <fig id="F3">
                  <title>
                     <p>Figure 3</p>
                  </title>
                  <caption>
                     <p>First level of BET</p>
                  </caption>
                  <text>
                     <p><b>First level of BET</b>.</p>
                  </text>
                  <graphic file="1756-0381-2-9-3"/>
               </fig>
               <p>The second level of the BET is made up of nodes that are the union of genes and the intersection of conditions in the first level.</p>
               <p>In the Figure <figr fid="F4">4</figr>, we explain the construction of the children of node <it>I</it><sub>1</sub>. Each dashed edges without cross represents a valid combination between two nodes (with ASR = 1). First, we perform the union of genes of node labeled <it>I</it><sub><it>1 </it></sub>with those of <it>I</it><sub><it>2 </it></sub>(first uncle), and the intersection of {c<sub>1</sub>, c<sub>2</sub>, c<sub>3</sub>, c<sub>4</sub>, c<sub>5</sub>} of <it>I</it><sub><it>1 </it></sub>with those of {c<sub>1</sub>, c<sub>2</sub>, c<sub>3</sub>, c<sub>4</sub>, c<sub>6</sub>} of <it>I</it><sub><it>2</it></sub>. The ASR of the obtained bicluster (<it>I</it><sub><it>1</it></sub>, <it>I</it><sub><it>2</it></sub>; c<sub>1</sub>, c<sub>2</sub>, c<sub>3</sub>, c<sub>4</sub>) is 1; hence we insert it as a first child of <it>I</it><sub><it>1</it></sub>. After that, we process <it>I</it><sub><it>1 </it></sub>with node labeled <it>I</it><sub><it>3 </it></sub>(second uncle). We obtain the bicluster (<it>I</it><sub><it>1</it></sub>, <it>I</it><sub><it>3</it></sub>; c<sub>2</sub>, c<sub>3</sub>, c<sub>4</sub>, c<sub>5</sub>) with ASR lower than 1, hence, this child bicluster of <it>I</it><sub><it>1 </it></sub>is discarded. We carry out the same process with node <it>I</it><sub><it>4</it></sub>. We obtain the bicluster (<it>I</it><sub><it>1</it></sub>, <it>I</it><sub><it>4</it></sub>; c<sub>1</sub>, c<sub>2</sub>, c<sub>3</sub>, c<sub>4</sub>) with ASR equal to 1. We insert it as child of <it>I</it><sub><it>1</it></sub>. Finally, with <it>I</it><sub><it>5 </it></sub>we obtain the bicluster (<it>I</it><sub><it>1</it></sub>, <it>I</it><sub><it>5</it></sub>; c<sub>1</sub>, c<sub>3</sub>, c<sub>4</sub>, c<sub>5</sub>) with ASR lower than 1; hence we don't insert it.</p>
               <fig id="F4">
                  <title>
                     <p>Figure 4</p>
                  </title>
                  <caption>
                     <p>Children construction of the first node of the second level of BET</p>
                  </caption>
                  <text>
                     <p><b>Children construction of the first node of the second level of BET</b>.</p>
                  </text>
                  <graphic file="1756-0381-2-9-4"/>
               </fig>
               <p>We repeat the same process for the node <it>I</it><sub><it>2</it></sub>, <it>I</it><sub><it>3</it></sub>, I<sub><it>4 </it></sub>and <it>I</it><sub><it>5</it></sub>. This completes the second level of the BET (Figure <figr fid="F5">5</figr>).</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>Second level of BET</p>
                  </caption>
                  <text>
                     <p><b>Second level of BET</b>.</p>
                  </text>
                  <graphic file="1756-0381-2-9-5"/>
               </fig>
               <p>The third level of the BET is made up of nodes that are the union of genes and the intersection of conditions in the second level (Figure <figr fid="F6">6</figr>).</p>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>Last level of BET</p>
                  </caption>
                  <text>
                     <p><b>Last level of BET</b>.</p>
                  </text>
                  <graphic file="1756-0381-2-9-6"/>
               </fig>
               <p>At each level of the BET, we keep only nodes whose ASR is <it>equal </it>to 1. The union of the leaves of the constructed BET that are not included in other leaves is { (<it>I</it><sub><it>1</it></sub>, <it>I</it><sub><it>2</it></sub>, <it>I</it><sub><it>4</it></sub>; c<sub>1</sub>, c<sub>2</sub>, c<sub>3</sub>, c<sub>4</sub>), (<it>I</it><sub><it>3</it></sub>, <it>I</it><sub><it>5; </it></sub>c<sub>3</sub>, c<sub>4</sub>, c<sub>5</sub>, c<sub>6</sub>) }. This constitutes the group of biclusters (Figure <figr fid="F7">7</figr>).</p>
               <fig id="F7">
                  <title>
                     <p>Figure 7</p>
                  </title>
                  <caption>
                     <p>Extracted biclusters are presented with bold line</p>
                  </caption>
                  <text>
                     <p><b>Extracted biclusters are presented with bold line</b>.</p>
                  </text>
                  <graphic file="1756-0381-2-9-7"/>
               </fig>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>In this section, we assess the <it>BiMine </it>algorithm on both synthetic and real DNA microarray data. We have implemented our algorithm in Java programming language. We compare <it>BiMine </it>results with the results of four prominent biclustering algorithms used by the community, named as: CC <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, OPSM <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, ISA <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> and <it>Bimax </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. For these reference algorithms, we have used <it>Biclustering Analysis Toolbox </it>(BicAT) which is a recent software platform for clustering-based data analysis that integrates all these biclustering algorithms <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>.</p>
         <sec>
            <st>
               <p>Synthetic Data</p>
            </st>
            <sec>
               <st>
                  <p>Data Sets</p>
               </st>
               <p>According to <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B19">19</abbr><abbr bid="B35">35</abbr></abbrgrp>, we generated randomly two types of synthetic datasets of size (I, J) = (200, 20). Different types of biclusters are embedded like constant columns, additive, multiplicative and coherent evolution biclusters. The first (resp. second) dataset contains biclusters without (resp. with) overlapping. To obtain statistically stable results, for each type of datasets, we generated 10 problem instances by randomly inserting the biclusters at different places in the data matrix.</p>
            </sec>
            <sec>
               <st>
                  <p>Comparison Criteria</p>
               </st>
               <p>Following <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, we have used the following two ratios to evaluate our biclustering algorithm:</p>
               <p>
                  <display-formula id="M5">
                     <graphic file="1756-0381-2-9-i18.gif"/>
                  </display-formula>
               </p>
               <p>with</p>
               <p><it>S</it><sub><it>cb </it></sub>= Portion size of biclusters correctly extracted</p>
               <p><it>Tot</it><sub><it>size </it></sub>= Total size of correct biclusters</p>
               <p>
                  <display-formula id="M6">
                     <graphic file="1756-0381-2-9-i19.gif"/>
                  </display-formula>
               </p>
               <p>with</p>
               <p><it>S</it><sub><it>ncb </it></sub>= Portion size of biclusters not correctly extracted</p>
               <p><it>Tot</it><sub><it>size </it></sub>= Total size of corrected biclusters</p>
               <p>The ratio <it>&#952;</it><sub><it>Shared </it></sub>(resp. <it>&#952;</it><sub><it>NotShared</it></sub>) expresses the percent of shared (resp. not shared) biclusters volume which corresponds (resp. not corresponds) with the real biclusters. In fact, when <it>&#952;</it><sub><it>Shared </it></sub>(resp. <it>&#952;</it><sub><it>NotShared</it></sub>) is equal to 100% the algorithm extracts the corrected (resp. not corrected) biclusters. A perfect solution have <it>&#952;</it><sub><it>Shared </it></sub>= 100% and <it>&#952;</it><sub><it>NotShared </it></sub>= 0%.</p>
            </sec>
            <sec>
               <st>
                  <p>Protocol for Experiments</p>
               </st>
               <p>For our biclustering algorithm, we have fixed <it>&#948; </it>= 0.2 and threshold of ASR = 0.85. The parameter settings used for the four reference algorithms are the default values as used in <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. We run all the algorithms and we select the 4 biclusters obtained by each algorithm which best fit the 4 real biclusters. We compute the <it>&#952;</it><sub><it>Shared </it></sub>and the <it>&#952;</it><sub><it>NotShared </it></sub>for each algorithm to show the averaged percentage of volume of the resulting biclusters which is shared and not shared with the real biclusters. The objective of this experiment is to determine which algorithm is able to extract all implanted biclusters.</p>
               <p>Table <tblr tid="T4">4</tblr> shows the best biclusters provided by each algorithm for the first dataset.</p>
               <tbl id="T4">
                  <title>
                     <p>Table 4</p>
                  </title>
                  <caption>
                     <p><it>BiMine </it>results and comparison with other algorithms in synthetic data without overlapped biclusters.</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c ca="center">
                           <p>
                              <b>Algorithms</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>&#952;</it>
                              </b>
                              <sub>
                                 <b>
                                    <it>Shared</it>
                                 </b>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>&#952;</it>
                              </b>
                              <sub>
                                 <b>
                                    <it>NotShared</it>
                                 </b>
                              </sub>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>CC</p>
                        </c>
                        <c ca="center">
                           <p>18.21%</p>
                        </c>
                        <c ca="center">
                           <p>36.57%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>OPSM</p>
                        </c>
                        <c ca="center">
                           <p>46.39%</p>
                        </c>
                        <c ca="center">
                           <p>74.42%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>ISA</p>
                        </c>
                        <c ca="center">
                           <p>39.38%</p>
                        </c>
                        <c ca="center">
                           <p>5.31%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <it>Bimax</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>58.18%</p>
                        </c>
                        <c ca="center">
                           <p>21.39%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <it>BiMine</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>100%</p>
                        </c>
                        <c ca="center">
                           <p>33.03%</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>As we can see in Table <tblr tid="T4">4</tblr>, <it>BiMine </it>can extract 100% of implanted biclusters with an extra volume that represent 33,03% of implanted biclusters. In fact, to obtain a new bicluster, combining two biclusters provide an extra volume only on conditions but give exactly the correct number of genes. However, the best of the studied algorithms, i.e., <it>Bimax</it>, can extract only 58.18% of implanted biclusters with 21.39% of extra volume. CC uses the MSR function of the selected elements as the biclustering criterion. When the signal of the implanted biclusters is weak, the greedy nature of CC may delete some rows and columns of the implanted biclusters in the beginning of the algorithm and miss the deleted rows and columns in the output biclusters. ISA uses only up-regulated and down-regulated constant expression values in its biclustering algorithm. When coherent biclusters exist, ISA may miss some rows and columns of the implanted biclusters. OPSM seeks only up and down regulation expression values with coherent evolution. Its performance decreases when there exist scenarios constant biclusters. The discretization preprocessing used by <it>Bimax </it>cannot identify the elements in the coherent biclusters. Hence, the algorithm cannot find exactly the implanted biclusters.</p>
               <p>Table <tblr tid="T5">5</tblr> illustrates the best biclusters provided by each algorithm for the second dataset.</p>
               <tbl id="T5">
                  <title>
                     <p>Table 5</p>
                  </title>
                  <caption>
                     <p><it>BiMine </it>results and comparison with other algorithms in synthetic data with overlapped biclusters.</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c ca="center">
                           <p>
                              <b>Algorithms</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>&#952;</it>
                              </b>
                              <sub>
                                 <b>
                                    <it>Shared</it>
                                 </b>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>&#952;</it>
                              </b>
                              <sub>
                                 <b>
                                    <it>NotShared</it>
                                 </b>
                              </sub>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>CC</p>
                        </c>
                        <c ca="center">
                           <p>9.21%</p>
                        </c>
                        <c ca="center">
                           <p>47.94%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>OPSM</p>
                        </c>
                        <c ca="center">
                           <p>42.87%</p>
                        </c>
                        <c ca="center">
                           <p>49.31%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>ISA</p>
                        </c>
                        <c ca="center">
                           <p>23.28%</p>
                        </c>
                        <c ca="center">
                           <p>23.97%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <it>Bimax</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>34.07%</p>
                        </c>
                        <c ca="center">
                           <p>3.43%</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <it>BiMine</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>85.35%</p>
                        </c>
                        <c ca="center">
                           <p>41.78%</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>As we can see in Table <tblr tid="T5">5</tblr>, the results with <it>BiMine </it>present the highest coverage of the correct biclusters. In fact, <it>BiMine </it>can extract 85.35% of implanted biclusters with an extra volume that represent 41.78% of implanted biclusters. However, the best of the studied algorithms, i.e., OPSM, can extract only 42.87% of implanted biclusters with 49.31% of extra volume. To find overlapped biclusters in a given matrix, some algorithms, e.g., CC, need to mask the discovered biclusters with random values which is not necessary for <it>BiMine</it>. ISA and OPSM are sensitive to overlapping biclusters. They use the normalization step in the first preprocessing step of their algorithms. With overlapping biclusters, the expression value range after normalization becomes narrower. Table <tblr tid="T5">5</tblr> shows that <it>BiMine </it>is marginally affected by the implanted overlap biclusters. We can conclude that <it>BiMine </it>can extract all implanted biclusters unlike other algorithms that can extract only certain types of biclusters.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Real data</p>
            </st>
            <sec>
               <st>
                  <p>Data Sets</p>
               </st>
               <p>We applied our approach to the well-known yeast cell-cycle dataset. This dataset is publicly available from <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> and described in <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> and processed in <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. It contains the expression profiles of more than 6000 yeast genes measured at 17 conditions over two complete cell cycles. In our experiments we use 2884 genes selected by <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Comparison Criteria</p>
               </st>
               <p>Two criteria are used. First, in order to evaluate the biological relevance of our proposed biclustering algorithm, we compute the <it>p</it>-values to indicate the quality of the extracted biclusters. Second, we identify the biological annotations for the extracted biclusters.</p>
            </sec>
            <sec>
               <st>
                  <p>Protocol for Experiments</p>
               </st>
               <p>For our biclustering algorithm, we have fixed <it>&#948; </it>= 0.1 and threshold of ASR = 0.85. The parameter settings used for the different reference biclustering algorithms are the default settings as used in <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. For the first experiment, we run all the algorithms and we compute the <it>p</it>-value for extracted biclusters. With <it>BiMine </it>(resp. <it>Bimax</it>), we have obtained more than 1800 (resp. 3700) biclusters. Since a biological analysis on 1800 (resp. 3700) biclusters was not feasible, only the 100 biggest biclusters with high ASR were selected for analysis like Christinat <it>et al</it>. <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Post-filtering was applied for all algorithms in order to eliminate insignificant biclusters like Cheng <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. With the others algorithms, we obtained 10 biclusters for CC, 45 biclusters for ISA and 14 biclusters for OPSM. For the second experiment, we use a well-known web-tool to search for the significant shared Gene Ontology terms of the groups of genes.</p>
               <sec>
                  <st>
                     <p>Biological relevance</p>
                  </st>
                  <p>In order to evaluate the biological relevance of our proposed biclustering algorithm, we compare it with the results of CC, ISA, <it>Bimax</it>, OPSM on yeast cell-cycle dataset. The idea is to determine whether the set of genes discovered by biclustering algorithms shows significant enrichment with respect to a specific Gene Ontology (GO) annotation. We use the web-tool <it>FuncAssociate </it><abbrgrp><abbr bid="B39">39</abbr></abbrgrp> to evaluate the discovered biclusters. <it>FuncAssociate </it>computes the adjusted significance scores for each bicluster. Indeed, the adjusted significance scores assess genes in each bicluster by computing adjusted <it>p</it>-values, which indicates how well they match with the different GO categories. Note that a smaller <it>p-</it>value, <it>close </it>to 0, is indicative of a better match <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Table <tblr tid="T6">6</tblr> represents the different values of significant scores <it>p</it>-value for each algorithm over the percentage of total extracted biclusters. In fact with <it>BiMine</it>, 100% of tested biclusters have <it>p</it>-value = 5%. The same result is obtained with <it>p</it>-value = 1%. With <it>p</it>-value equals to 0.5% (resp. 0.1%), <it>BiMine </it>has 93% (resp. 82%) of biclusters. On the other hand, the best results (with the <it>p</it>-value is equals to 0.5% and 0.1% respectively) among the compared algorithms are obtained by <it>Bimax </it>with 89% (resp. 79%) of extracted biclusters. Finally, 51% of extracted biclusters with <it>BiMine </it>have <it>p</it>-value = 0.001% while those of <it>Bimax </it>have 64%. We note that <it>BiMine </it>performs well for all <it>p</it>-values compared to CC, ISA and OPSM. Also, <it>BiMine </it>performs well for four cases of <it>p</it>-value (<it>p</it>-value = 5%, <it>p</it>-value = 1%, <it>p</it>-value = 0.5% and <it>p</it>-value = 0.1%) over five compared to <it>Bimax</it>. Best results are obtained by <it>BiMine </it>and <it>Bimax</it>.</p>
                  <tbl id="T6">
                     <title>
                        <p>Table 6</p>
                     </title>
                     <caption>
                        <p>Proportions of Biclusters significantly enriched by GO annotations.</p>
                     </caption>
                     <tblbdy cols="6">
                        <r>
                           <c ca="right">
                              <p>
                                 <b>p-value</b>
                              </p>
                           </c>
                           <c ca="center">
                              <p>
                                 <b>5%</b>
                              </p>
                           </c>
                           <c ca="center">
                              <p>
                                 <b>1%</b>
                              </p>
                           </c>
                           <c ca="center">
                              <p>
                                 <b>0.5%</b>
                              </p>
                           </c>
                           <c ca="center">
                              <p>
                                 <b>0.1%</b>
                              </p>
                           </c>
                           <c ca="center">
                              <p>
                                 <b>0.001%</b>
                              </p>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>
                                 <b>Algorithms</b>
                              </p>
                           </c>
                           <c>
                              <p/>
                           </c>
                           <c>
                              <p/>
                           </c>
                           <c>
                              <p/>
                           </c>
                           <c>
                              <p/>
                           </c>
                           <c>
                              <p/>
                           </c>
                        </r>
                        <r>
                           <c cspan="6">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>
                                 <it>BiMine</it>
                              </p>
                           </c>
                           <c ca="center">
                              <p>100</p>
                           </c>
                           <c ca="center">
                              <p>100</p>
                           </c>
                           <c ca="center">
                              <p>93</p>
                           </c>
                           <c ca="center">
                              <p>82</p>
                           </c>
                           <c ca="center">
                              <p>51</p>
                           </c>
                        </r>
                        <r>
                           <c cspan="6">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>OPSM</p>
                           </c>
                           <c ca="center">
                              <p>100</p>
                           </c>
                           <c ca="center">
                              <p>100</p>
                           </c>
                           <c ca="center">
                              <p>86</p>
                           </c>
                           <c ca="center">
                              <p>36</p>
                           </c>
                           <c ca="center">
                              <p>22</p>
                           </c>
                        </r>
                        <r>
                           <c cspan="6">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>
                                 <it>Bimax</it>
                              </p>
                           </c>
                           <c ca="center">
                              <p>100</p>
                           </c>
                           <c ca="center">
                              <p>100</p>
                           </c>
                           <c ca="center">
                              <p>89</p>
                           </c>
                           <c ca="center">
                              <p>79</p>
                           </c>
                           <c ca="center">
                              <p>64</p>
                           </c>
                        </r>
                        <r>
                           <c cspan="6">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>ISA</p>
                           </c>
                           <c ca="center">
                              <p>89</p>
                           </c>
                           <c ca="center">
                              <p>89</p>
                           </c>
                           <c ca="center">
                              <p>87</p>
                           </c>
                           <c ca="center">
                              <p>69</p>
                           </c>
                           <c ca="center">
                              <p>32</p>
                           </c>
                        </r>
                        <r>
                           <c cspan="6">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>CC</p>
                           </c>
                           <c ca="center">
                              <p>80</p>
                           </c>
                           <c ca="center">
                              <p>70</p>
                           </c>
                           <c ca="center">
                              <p>60</p>
                           </c>
                           <c ca="center">
                              <p>20</p>
                           </c>
                           <c ca="center">
                              <p>10</p>
                           </c>
                        </r>
                     </tblbdy>
                  </tbl>
                  <p>Furthermore, in order to identify the biological annotations for the extracted biclusters we use <it>GOTermFinder </it><url>http://db.yeastgenome.org/cgi-bin/GO/goTermFinder</url> which is a tool available in the <it>Saccharomyces Genome Database </it>(SGD). <it>GOTermFinder </it>is designed to search for the significant shared GO terms of the groups of genes and provides users with the means to identify the characteristics that the genes may have in common.</p>
                  <p>We present the significant shared GO terms (or parent of GO terms) used to describe the two selected set of genes (extracted by <it>BiMine</it>) with 11 genes &#215; 11 conditions and 12 genes &#215; 13 conditions in each bicluster with ASR equal to 0.8690 and 0.8873 respectively, for biological process, molecular function and cellular component. As <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, we report the most significant GO terms shared by these biclusters. For example, with the first bicluster (Table <tblr tid="T7">7</tblr>), the genes (<it>YDL003W, YDL164C, YDR097C, YDR440W, YKL113C, YLL002W, YLR183C, YNL102W</it>) are particularly involved in the process of cellular response to DNA damage stimulus, response to DNA damage stimulus, cellular response to stress, cellular response to stimulus, response to stress and response to stimulus.</p>
                  <tbl id="T7">
                     <title>
                        <p>Table 7</p>
                     </title>
                     <caption>
                        <p>Most significant shared GO terms (process, function, component) for two biclusters on Yeast data.</p>
                     </caption>
                     <tblbdy cols="4">
                        <r>
                           <c ca="left">
                              <p>
                                 <b>Bicluster volume (genes &#215; conditions)</b>
                              </p>
                           </c>
                           <c ca="left">
                              <p>
                                 <b>Process Ontology</b>
                              </p>
                           </c>
                           <c ca="left">
                              <p>
                                 <b>Function Ontology</b>
                              </p>
                           </c>
                           <c ca="left">
                              <p>
                                 <b>Component Ontology</b>
                              </p>
                           </c>
                        </r>
                        <r>
                           <c cspan="4">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>(12 &#215; 13)</p>
                           </c>
                           <c ca="left">
                              <p>cellular response to DNA damage stimulus (66.7%, 1.87e-08)</p>
                              <p>response to DNA damage stimulus (66.7%, 6.30e-08)</p>
                              <p>cellular response to stress(66.7%, 2.12e-07)</p>
                              <p>cellular response to stimulus(66,7%, 3.25e-07)</p>
                              <p>DNA repair(50%, 2.58e-05)</p>
                              <p>response to stress(66.7%, 2.98e-05)</p>
                           </c>
                           <c ca="left">
                              <p>chromatin binding (25%,0.00037)</p>
                           </c>
                           <c ca="left">
                              <p>microtubule organizing center part(16.7%, 0.00742)</p>
                           </c>
                        </r>
                        <r>
                           <c cspan="4">
                              <hr/>
                           </c>
                        </r>
                        <r>
                           <c ca="left">
                              <p>(11 &#215; 11)</p>
                           </c>
                           <c ca="left">
                              <p>cell cycle process (63.6%, 2.93e-05)</p>
                              <p>cell cycle (63.6%, 6.85e-05)</p>
                           </c>
                           <c ca="left">
                              <p>GTPase activator activity (18.2%,0.00994)</p>
                           </c>
                           <c ca="left">
                              <p>microtubule cytoskeleton (45.5%, 6.33e-06)</p>
                              <p>microtubule organizing center (36.4%,4.97e-05)</p>
                              <p>spindle pole body (36.4%, 4.97e-05)</p>
                              <p>spindle pole (36.4%, 6.77e-05)</p>
                           </c>
                        </r>
                     </tblbdy>
                  </tbl>
                  <p>The values within parentheses after each GO term in Table <tblr tid="T7">7</tblr>, such as (66.7%, 1.87e-08) in the first bicluster, indicate the cluster frequency and the statistical significance. The cluster frequency (66.7%) shows that out of 12 genes in the first bicluster 8 belong to this process, and the statistical significance is provided by a <it>p</it>-value of 1.87e-08 (highly significant).</p>
                  <p>According to <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>, in microarray data analysis, genes are considered to be in the same cluster if their trajectory patterns of expression levels are similar across a set of conditions. Figure <figr fid="F8">8</figr> shows the biclusters of Table <tblr tid="T7">7</tblr> found by <it>BiMine </it>algorithm on the yeast dataset. From a visual inspection of the biclusters presented, we can notice that the genes present a similar behaviour under a subset of conditions. In Additional File <supplr sid="S1">1</supplr>, we show the best bicluster found by each compared algorithm using <it>GoTermFinder</it>. Also, we show their gene expression profiles drawn by BicAT. We notice that <it>BiMine </it>and <it>Bimax </it>have a high <it>p</it>-value. CC (resp. OPSM) cannot identify any component ontology (resp. function ontology) and ISA have <it>p</it>-value lower than <it>BiMine</it>.</p>
                  <fig id="F8">
                     <title>
                        <p>Figure 8</p>
                     </title>
                     <caption>
                        <p>Two Biclusters found by <it>BiMine </it>on Yeast dataset</p>
                     </caption>
                     <text>
                        <p><b>Two Biclusters found by <it>BiMine </it>on Yeast dataset</b>. (a): Bicluster of size (12 &#215; 13) with ASR = 0.8873. (b): Bicluster of size (11 &#215; 11) with ASR = 0.8690.</p>
                     </text>
                     <graphic file="1756-0381-2-9-8"/>
                  </fig>
                  <suppl id="S1">
                     <title>
                        <p>Additional file 1</p>
                     </title>
                     <text>
                        <p><b>The best bicluster obtained by each compared algorithm</b>. This file illustrates the best bicluster found by each compared algorithm using <it>GoTermFinder</it>. The gene expression profile of each best bicluster is drawn using BicAT.</p>
                     </text>
                     <file name="1756-0381-2-9-S1.DOC">
                        <p>Click here for file</p>
                     </file>
                  </suppl>
                  <p>All these experiments show that for this dataset, the proposed approach is able to detect biologically significant and functionally enriched biclusters with low <it>p</it>-value. Furthermore, <it>BiMine </it>gives a good degree of homogeneity.</p>
               </sec>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p><it>BiMine </it>algorithm has several interesting features. First, with <it>BiMine</it>, we avoid using a discretization of the data matrix. Indeed, classifying the gene expression values using intervals often leads to bad results <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. Also, the discretization may limit the performance of an algorithm to discover a biological model because of noises which are inherent in most experiences of microarrays <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Thus, to discretize biological data we must have a good knowledge of these data to assign good values. However, this is not always possible.</p>
         <p>Second, the <it>BiMine </it>algorithm can enumerate all possible cases of attributes while reducing the tree size. In fact, the parametric rule based on ASR threshold allows the enumeration process to prune tree branches that cannot lead to good biclusters.</p>
         <p>Third, the <it>BiMine </it>algorithm provides naturally multiple biclusters of variable sizes. The number of the desired biclusters can be determined by tuning the ASR threshold. These multiple solutions of different sizes and different characteristics may be of interest for biological investigations.</p>
         <p>Forth, the new ASR evaluation function can be applied by other biclustering algorithm in replacement of MSR or ACV. It can also be used as a complementary function to these previously ones.</p>
         <p>Finally, in <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, it has been shown that Spearman's rank correlation is less sensitive to the presence of noise in the data. Since our evaluation function ASR is based on Spearman rank correlation, ASR would also be less sensitive to the presence of noise in the data.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>In this paper, we described <it>BiMine</it>, a new algorithm for biclustering of DNA microarray data. Compared with existing biclustering algorithms, <it>BiMine </it>distinguishes itself by a number of original features. First, <it>BiMine </it>operates directly on the raw data matrix without resorting to a discretization of data, reducing thus the risk of loss of information. Second, with <it>BiMine</it>, it is not necessary to fix a minimum or maximum number of genes or conditions, enabling the generation of diversified biclusters. Third, using a convenient tree structure for representing biclusters with a parametric and effective branch pruning rule, <it>BiMine </it>is able to explore effectively the search space. Notice that ASR can also be used by other biclustering algorithm as an alternative evaluation function.</p>
         <p>The performance of the <it>BiMine </it>algorithm is tested and assessed on a set of synthetic data as well as a real microarray data (yeast cell-cycle). Computational experiments showed highly competitive results of <it>BiMine </it>in comparison with four other popular biclustering algorithms for both types of datasets. In addition, a biological validation of the selected genes within the biclusters for yeast cell-cycle has been provided based on a publicly available Gene Ontology (GO) annotation tool. Notice that although we presented <it>BiMine </it>with the context of DNA microarray data analysis, it should be clear that the algorithm can be applied or adapted to other biclustering problems.</p>
         <p>Finally, let us mention that the proposed algorithm is computational time expensive; one of our ongoing works aims to find new heuristics to speed up the enumeration process. In particular, it would be possible to define other heuristic rules to improve the branch pruning in order to further reduce the size of the explored search tree.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>WA implemented the system, conducted the experimentations and wrote the draft manuscript. ME and JKH supervised the project and co-wrote the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors are grateful to Dr. Jason Moore and Dr. Federico Divina for their insightful comments and questions that helped us to improve the work.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Biclustering of expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Cheng</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology</source>
            <publisher>AAAI Press</publisher>
            <pubdate>2000</pubdate>
            <fpage>93</fpage>
            <lpage>103</lpage>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Information-theoretical coclustering</p>
            </title>
            <aug>
               <au>
                  <snm>Dhillon</snm>
                  <fnm>IS</fnm>
               </au>
               <au>
                  <snm>Mallela</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Modha</snm>
                  <fnm>DS</fnm>
               </au>
            </aug>
            <source>Proc. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03)</source>
            <pubdate>2003</pubdate>
            <fpage>89</fpage>
            <lpage>98</lpage>
            <xrefbib>
               <pubid idtype="doi">full_text</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>RCV1: A new benchmark collection for text categorization research</p>
            </title>
            <aug>
               <au>
                  <snm>Lewis</snm>
                  <fnm>DD</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Journal of Machine Learning Research</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>361</fpage>
            <lpage>97</lpage>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Latent Class Models for Collaborative Filtering</p>
            </title>
            <aug>
               <au>
                  <snm>Hofmann</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Puzicha</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proc. International Joint Conference on Artificial Intelligence</source>
            <pubdate>1999</pubdate>
            <fpage>668</fpage>
            <lpage>693</lpage>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Clustering by pattern similarity in large data sets</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>SIGMOD '02: Proceedings of the international conference on Management of data</source>
            <publisher>ACM SIGMOD, New York, NY, USA</publisher>
            <pubdate>2002</pubdate>
            <fpage>394</fpage>
            <lpage>405</lpage>
            <xrefbib>
               <pubid idtype="doi">full_text</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>A new algorithm for two-mode clustering</p>
            </title>
            <aug>
               <au>
                  <snm>Gaul</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Schader</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Data Analysis and Information Systems</source>
            <publisher>Springer</publisher>
            <pubdate>1996</pubdate>
            <fpage>15</fpage>
            <lpage>23</lpage>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Direct clustering of a data matrix</p>
            </title>
            <aug>
               <au>
                  <snm>Hartigan</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Journal of the American Statistical Association</source>
            <pubdate>1978</pubdate>
            <volume>67</volume>
            <issue>337</issue>
            <fpage>123</fpage>
            <lpage>129</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2284710</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Automatic subspace clustering of high dimensional data for data mining applications</p>
            </title>
            <aug>
               <au>
                  <snm>Agrawal</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gehrke</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gunopulus</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Raghavan</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Proc. 1st ACM/SIGMOD International Conference on Management of Data</source>
            <pubdate>1998</pubdate>
            <fpage>94</fpage>
            <lpage>105</lpage>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Plaid models for gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Lazzeroni</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Owen</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Statistica Sinica</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>61</fpage>
            <lpage>86</lpage>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Discovering local structure in gene expression data: the order-preserving submatrix problem</p>
            </title>
            <aug>
               <au>
                  <snm>Ben-Dor</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chor</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Karp</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yakhini</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2003</pubdate>
            <volume>10</volume>
            <fpage>373</fpage>
            <lpage>384</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/10665270360688075</pubid>
                  <pubid idtype="pmpid" link="fulltext">12935334</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Enhanced biclustering on expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Proceedings of the Third IEEE Symposium on Bioinformatics and Bioengineering (BIBE'03)</source>
            <pubdate>2003</pubdate>
            <fpage>1</fpage>
            <lpage>7</lpage>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Computing the maxim <it>um similarity bi-clusters of gene expression data</it></p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>1</issue>
            <fpage>50</fpage>
            <lpage>56</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl560</pubid>
                  <pubid idtype="pmpid" link="fulltext">17090578</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization</p>
            </title>
            <aug>
               <au>
                  <snm>Cheng</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Law</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Siu</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Liew</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>210</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-9-210</pubid>
                  <pubid idtype="pmcid">2396181</pubid>
                  <pubid idtype="pmpid" link="fulltext">18433478</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Discovering biclusters by iteratively sorting with weighted correlation coefficient in gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Teng</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>J Signal Process Syst</source>
            <pubdate>2008</pubdate>
            <volume>50</volume>
            <issue>3</issue>
            <fpage>267</fpage>
            <lpage>280</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/s11265-007-0121-2</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>A systematic comparison and evaluation of biclustering methods for gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Prelic</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bleuler</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zimmermann</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Buhlmann</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gruissem</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hennig</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Thiele</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Zitzler</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>9</issue>
            <fpage>1122</fpage>
            <lpage>1129</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl060</pubid>
                  <pubid idtype="pmpid" link="fulltext">16500941</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Discovering statistically significant biclusters in gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Tanay</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sharan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>S136</fpage>
            <lpage>S144</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12169541</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Op-cluster: Clustering by tendency in high dimensional space</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Proc.3rd IEEE International Conference on Data Mining</source>
            <pubdate>2003</pubdate>
            <fpage>187</fpage>
            <lpage>194</lpage>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Exhaustive Search Method of Gene Expression Modules and Its Application to Human Tissue Data</p>
            </title>
            <aug>
               <au>
                  <snm>Okada</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Okubo</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Horton</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Fujibuchi</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>IAENG International Journal of Computer Science</source>
            <pubdate>2007</pubdate>
            <volume>34</volume>
            <fpage>1</fpage>
            <lpage>16</lpage>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Application of simulated annealing to the biclustering of gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Bryan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Cunningham</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bolshakova</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>IEEE Transactions on Information Technology on Biomedicine</source>
            <pubdate>2006</pubdate>
            <volume>10</volume>
            <issue>3</issue>
            <fpage>519</fpage>
            <lpage>525</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/TITB.2006.872073</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Biclustering of gene expression data using reactive greedy randomized adaptive search procedure</p>
            </title>
            <aug>
               <au>
                  <snm>Dharan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nair</snm>
                  <fnm>AS</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <issue>Suppl 1</issue>
            <fpage>S27</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-10-S1-S27</pubid>
                  <pubid idtype="pmcid">2648745</pubid>
                  <pubid idtype="pmpid" link="fulltext">19208127</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>An EA framework for biclustering of gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Bleuler</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Prelic</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zitzler</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Proceedings of Congress on Evolutionary Computation</source>
            <pubdate>2004</pubdate>
            <volume>1</volume>
            <fpage>166</fpage>
            <lpage>173</lpage>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Multi-objective evolutionary biclustering of gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Mitra</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Banka</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Pattern Recognition</source>
            <pubdate>2006</pubdate>
            <fpage>2464</fpage>
            <lpage>2477</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/j.patcog.2006.03.003</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>A Multi-Objective Approach to Discover Biclusters in Microarray Data</p>
            </title>
            <aug>
               <au>
                  <snm>Divina</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Aguilar-Ruiz</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proceedings of the 9th annual conference on Genetic and evolutionary computation</source>
            <pubdate>2007</pubdate>
            <fpage>385</fpage>
            <lpage>392</lpage>
            <xrefbib>
               <pubid idtype="doi">full_text</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Microarray Biclustering: A Novel Memetic Approach Based on the PISA Platform</p>
            </title>
            <aug>
               <au>
                  <snm>Gallo</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Carballido</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ponzoni</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>EvoBIO: Proceedings of the 7th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics</source>
            <pubdate>2009</pubdate>
            <fpage>44</fpage>
            <lpage>55</lpage>
            <xrefbib>
               <pubid idtype="doi">full_text</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Biclustering algorithms for biological data analysis: A survey</p>
            </title>
            <aug>
               <au>
                  <snm>Madeira</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Oliveira</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)</source>
            <pubdate>2004</pubdate>
            <volume>1</volume>
            <issue>1</issue>
            <fpage>24</fpage>
            <lpage>45</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/TCBB.2004.2</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Mining deterministic biclusters in gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Teo</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ooi</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>KL</fnm>
               </au>
            </aug>
            <source>Proceedings of the Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'04)</source>
            <pubdate>2004</pubdate>
            <fpage>283</fpage>
            <lpage>292</lpage>
            <xrefbib>
               <pubid idtype="doi">full_text</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Random walk biclustering for microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Angiulli</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Cesario</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Pizzuti</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Journal of Information Sciences</source>
            <pubdate>2008</pubdate>
            <fpage>1479</fpage>
            <lpage>1497</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/j.ins.2007.11.007</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Shifting and scaling patterns from gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Aguilar-Ruiz</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>3840</fpage>
            <lpage>3845</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti641</pubid>
                  <pubid idtype="pmpid" link="fulltext">16144809</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Virtual error: A new measure for evolutionary biclustering</p>
            </title>
            <aug>
               <au>
                  <snm>Pontes</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Divina</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Giraldez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Aguilar-Ruiz</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics</source>
            <pubdate>2007</pubdate>
            <fpage>217</fpage>
            <lpage>226</lpage>
            <xrefbib>
               <pubid idtype="doi">full_text</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Nonparametrics: Statistical Methods Based on Ranks</p>
            </title>
            <aug>
               <au>
                  <snm>Lehmann</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>D'Abrera</snm>
                  <fnm>HJM</fnm>
               </au>
            </aug>
            <source>rev. ed</source>
            <publisher>Englewood Cliffs, NJ: Prentice-Hall</publisher>
            <pubdate>1998</pubdate>
            <fpage>292</fpage>
            <lpage>323</lpage>
         </bibl>
         <bibl id="B31">
            <title>
               <p>An efficient biclustering algorithm for finding genes with similar patterns in time-series expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Madeira</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Oliveira</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Proc. of the 5th Asia Pacific Bioinformatics Conference, Series in Advances in Bioinformatics and Computational Biology</source>
            <publisher>Imperial College Press</publisher>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>67</fpage>
            <lpage>80</lpage>
            <xrefbib>
               <pubid idtype="doi">full_text</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Strategies for identifying statistically significant dense regions in microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Yip</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ng</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>IEEE/ACM Trans Comput Biol Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>4</volume>
            <issue>3</issue>
            <fpage>415</fpage>
            <lpage>429</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/TCBB.2007.1022</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Defining transcription modules using large-scale gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Bergmann</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ihmels</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barkai</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>13</volume>
            <fpage>1993</fpage>
            <lpage>2003</lpage>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Bicat: a biclustering analysis toolbox</p>
            </title>
            <aug>
               <au>
                  <snm>Barkow</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bleuler</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Prelic</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zimmermann</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Zitzler</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>10</issue>
            <fpage>1282</fpage>
            <lpage>1283</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl099</pubid>
                  <pubid idtype="pmpid" link="fulltext">16551664</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Possibilistic approach for biclustering microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Cano</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Adarve</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>L&#243;pez</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Blanco</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Computers in Biology and Medicine</source>
            <pubdate>2007</pubdate>
            <volume>37</volume>
            <fpage>1426</fpage>
            <lpage>1436</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.compbiomed.2007.01.005</pubid>
                  <pubid idtype="pmpid" link="fulltext">17346690</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Biclustering of expression data. (supplementary information)</p>
            </title>
            <aug>
               <au>
                  <snm>Cheng</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Technical report</source>
            <pubdate>2006</pubdate>
            <url>http://arep.med.harvard.edu/biclustering</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Systematic determination of genetic network architecture</p>
            </title>
            <aug>
               <au>
                  <snm>Tavazoie</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Cho</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Nature Genetics</source>
            <pubdate>1999</pubdate>
            <volume>22</volume>
            <fpage>281</fpage>
            <lpage>285</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/10343</pubid>
                  <pubid idtype="pmpid" link="fulltext">10391217</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Gene Expression Data Analysis Using a Novel Approach to Biclustering Combining Discrete and Continuous Data</p>
            </title>
            <aug>
               <au>
                  <snm>Christinat</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Wachmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>IEEE/ACM Transactions on Computational Biology and Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>5</volume>
            <issue>4</issue>
            <fpage>583</fpage>
            <lpage>593</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/TCBB.2007.70251</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Charactering gene sets with FuncAssociate</p>
            </title>
            <aug>
               <au>
                  <snm>Berriz</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>OD</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Frederick</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>2502</fpage>
            <lpage>2504</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg363</pubid>
                  <pubid idtype="pmpid" link="fulltext">14668247</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes</p>
            </title>
            <aug>
               <au>
                  <snm>Maulik</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Mukhopadhyay</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bandyopadhyay</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <fpage>27</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-10-27</pubid>
                  <pubid idtype="pmcid">2657792</pubid>
                  <pubid idtype="pmpid" link="fulltext">19154590</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference</p>
            </title>
            <aug>
               <au>
                  <snm>Peddada</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Lobenhofer</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Afshari</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Weinberg</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Umbach</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>834</fpage>
            <lpage>841</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg093</pubid>
                  <pubid idtype="pmpid" link="fulltext">12724293</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Using hidden Markov models to analyze gene expression time course data</p>
            </title>
            <aug>
               <au>
                  <snm>Schliep</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schonhuth</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Steinhoff</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>i255</fpage>
            <lpage>i263</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg1036</pubid>
                  <pubid idtype="pmpid" link="fulltext">12855468</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Clustering of time-course gene expression data using a mixed-effects model with B-splines</p>
            </title>
            <aug>
               <au>
                  <snm>Luan</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>474</fpage>
            <lpage>482</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg014</pubid>
                  <pubid idtype="pmpid" link="fulltext">12611802</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Improved biclustering of microarray data demonstrated through systematic performance tests</p>
            </title>
            <aug>
               <au>
                  <snm>Turner</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Krzanowski</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Journal of Computational Statistics and Data analysis</source>
            <pubdate>2005</pubdate>
            <volume>48</volume>
            <fpage>235</fpage>
            <lpage>254</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/j.csda.2004.02.003</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Clustering of gene expression data using a local shape-based similarity measure</p>
            </title>
            <aug>
               <au>
                  <snm>Balasubramaniyan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>llermeier</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Weskamp</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Kamper</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1069</fpage>
            <lpage>1077</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti095</pubid>
                  <pubid idtype="pmpid" link="fulltext">15513997</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>

