Email updates

Keep up to date with the latest news and content from BioData Mining and BioMed Central.

Open Access Highly Accessed Open Badges Methodology

Preprocessing differential methylation hybridization microarray data

Shuying Sun12*, Yi-Wen Huang3, Pearlly S Yan3, Tim HM Huang3 and Shili Lin4

Author Affiliations

1 Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, Ohio, 44106, USA

2 Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, 44106, USA

3 Human Cancer Genetics Program, The Ohio State University, Columbus, Ohio, 43210, USA

4 Department of Statistics, The Ohio State University, Columbus, Ohio, 43210, USA

For all author emails, please log on.

BioData Mining 2011, 4:13  doi:10.1186/1756-0381-4-13

Published: 16 May 2011



DNA methylation plays a very important role in the silencing of tumor suppressor genes in various tumor types. In order to gain a genome-wide understanding of how changes in methylation affect tumor growth, the differential methylation hybridization (DMH) protocol has been developed and large amounts of DMH microarray data have been generated. However, it is still unclear how to preprocess this type of microarray data and how different background correction and normalization methods used for two-color gene expression arrays perform for the methylation microarray data. In this paper, we demonstrate our discovery of a set of internal control probes that have log ratios (M) theoretically equal to zero according to this DMH protocol. With the aid of this set of control probes, we propose two LOESS (or LOWESS, locally weighted scatter-plot smoothing) normalization methods that are novel and unique for DMH microarray data. Combining with other normalization methods (global LOESS and no normalization), we compare four normalization methods. In addition, we compare five different background correction methods.


We study 20 different preprocessing methods, which are the combination of five background correction methods and four normalization methods. In order to compare these 20 methods, we evaluate their performance of identifying known methylated and un-methylated housekeeping genes based on two statistics. Comparison details are illustrated using breast cancer cell line and ovarian cancer patient methylation microarray data. Our comparison results show that different background correction methods perform similarly; however, four normalization methods perform very differently. In particular, all three different LOESS normalization methods perform better than the one without any normalization.


It is necessary to do within-array normalization, and the two LOESS normalization methods based on specific DMH internal control probes produce more stable and relatively better results than the global LOESS normalization method.