<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/rss.css" type="text/css"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/"
    xmlns:cc="http://web.resource.org/cc/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:extra="http://www.w3.org/1999/xhtml"
    xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <channel rdf:about="http://www.biodatamining.org/feeds/latestarticles/journal?quantity=&amp;format=rss&amp;version=">
        <title>BioData Mining - Latest Articles</title>
        <link>http://www.biodatamining.org/</link>
        <description>The latest research articles published by BioData Mining</description>
        <dc:date>2010-08-13T00:00:00Z</dc:date>
        <items>
            <rdf:Seq>
                                <rdf:li rdf:resource="http://www.biodatamining.org/content/3/1/4" />
                                <rdf:li rdf:resource="http://www.biodatamining.org/content/3/1/3" />
                                <rdf:li rdf:resource="http://www.biodatamining.org/content/3/1/2" />
                                <rdf:li rdf:resource="http://www.biodatamining.org/content/3/1/1" />
                                <rdf:li rdf:resource="http://www.biodatamining.org/content/2/1/9" />
                                <rdf:li rdf:resource="http://www.biodatamining.org/content/2/1/8" />
                                <rdf:li rdf:resource="http://www.biodatamining.org/content/2/1/7" />
                                <rdf:li rdf:resource="http://www.biodatamining.org/content/2/1/6" />
                                <rdf:li rdf:resource="http://www.biodatamining.org/content/2/1/5" />
                                <rdf:li rdf:resource="http://www.biodatamining.org/content/2/1/4" />
                            </rdf:Seq>
        </items>
        <extra:info rdf:parseType="Literal">
            <html:div style="font:14px Verdana, Geneva, Arial, Helvetica, sans-serif" xmlns:html="http://www.w3.org/1999/xhtml">
                <html:span style="font-weight:bold">
                    This is an RSS newsfeed from BioMed Central
                </html:span>
                <html:br />
                <html:span style="font-size: 12px;">
                    It is intended to be used with an RSS reader. For more information about RSS newsfeeds from BioMed Central, visit
                    <html:br />
                    <html:a href="http://www.biomedcentral.com/info/about/rss/" style="color:#3333CC; font-size:12px;">
                        http://www.biomedcentral.com/info/about/rss/
                    </html:a>
                    <html:br />
                </html:span>
            </html:div>
        </extra:info>
        <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </channel>
        <item rdf:about="http://www.biodatamining.org/content/3/1/4">
        <title>SICTIN: Rapid footprinting of massively parallel sequencing data </title>
        <description>Background:
Massively parallel sequencing allows for genome-wide hypothesis-free investigation of for instance transcription factor binding sites or histone modifications. Although nucleotide resolution detailed information can easily be generated, biological insight often requires a more general view of patterns (footprints) over distinct genomic features such as transcription start sites, exons or repetitive regions. The construction of these footprints is however a time consuming task.
Methods:
The presented software generates a binary representation of the signals enabling fast and scalable lookup. This representation allows for footprint generation in mere minutes on a desktop computer. Several different input formats are accepted, e.g. the SAM format, bed-files and the UCSC wiggle track.
Conclusions:
Hypothesis-free investigation of genome wide interactions allows for biological data mining at a scale never before seen. Until recently, the main focus of analysis of sequencing data has been targeted on signal patterns around transcriptional start sites which are in manageable numbers. Today, focus is shifting to a wider perspective and numerous genomic features are being studied. To this end, we provide a system allowing for fast querying in the order of hundreds of thousands of features.</description>
        <link>http://www.biodatamining.org/content/3/1/4</link>
                <dc:creator>Stefan Enroth</dc:creator>
                <dc:creator>Robin Andersson</dc:creator>
                <dc:creator>Claes Wadelius</dc:creator>
                <dc:creator>Jan Komorowski</dc:creator>
                <dc:source>BioData Mining 2010, 3:4</dc:source>
        <dc:date>2010-08-13T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1756-0381-3-4</dc:identifier>
        <prism:publicationName>BioData Mining</prism:publicationName>
        <prism:issn>1756-0381</prism:issn>
        <prism:volume>3</prism:volume>
        <prism:startingPage>4</prism:startingPage>
        <prism:publicationDate>2010-08-13T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biodatamining.org/content/3/1/3">
        <title>Applications and methods utilizing the Simple Semantic Web Architecture and Protocol (SSWAP) for bioinformatics resource discovery and disparate data and service integration</title>
        <description>Background:
Scientific data integration and computational service discovery are challenges for the bioinformatic community. This process is made more difficult by the separate and independent construction of biological databases, which makes the exchange of data between information resources difficult and labor intensive. A recently described semantic web protocol, the Simple Semantic Web Architecture and Protocol (SSWAP; pronounced &quot;swap&quot;) offers the ability to describe data and services in a semantically meaningful way. We report how three major information resources (Gramene, SoyBase and the Legume Information System [LIS]) used SSWAP to semantically describe selected data and web services.
Methods:
We selected high-priority Quantitative Trait Locus (QTL), genomic mapping, trait, phenotypic, and sequence data and associated services such as BLAST for publication, data retrieval, and service invocation via semantic web services. Data and services were mapped to concepts and categories as implemented in legacy and de novo community ontologies. We used SSWAP to express these offerings in OWL Web Ontology Language (OWL), Resource Description Framework (RDF) and eXtensible Markup Language (XML) documents, which are appropriate for their semantic discovery and retrieval. We implemented SSWAP services to respond to web queries and return data. These services are registered with the SSWAP Discovery Server and are available for semantic discovery at http://sswap.info.
Results:
A total of ten services delivering QTL information from Gramene were created. From SoyBase, we created six services delivering information about soybean QTLs, and seven services delivering genetic locus information. For LIS we constructed three services, two of which allow the retrieval of DNA and RNA FASTA sequences with the third service providing nucleic acid sequence comparison capability (BLAST).
Conclusions:
The need for semantic integration technologies has preceded available solutions. We report the feasibility of mapping high priority data from local, independent, idiosyncratic data schemas to common shared concepts as implemented in web-accessible ontologies. These mappings are then amenable for use in semantic web services. Our implementation of approximately two dozen services means that biological data at three large information resources (Gramene, SoyBase, and LIS) is available for programmatic access, semantic searching, and enhanced interaction between the separate missions of these resources.</description>
        <link>http://www.biodatamining.org/content/3/1/3</link>
                <dc:creator>Rex Nelson</dc:creator>
                <dc:creator>Shulamit Avraham</dc:creator>
                <dc:creator>Randy Shoemaker</dc:creator>
                <dc:creator>Gregory May</dc:creator>
                <dc:creator>Doreen Ware</dc:creator>
                <dc:creator>Damian Gessler</dc:creator>
                <dc:source>BioData Mining 2010, 3:3</dc:source>
        <dc:date>2010-06-04T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1756-0381-3-3</dc:identifier>
        <prism:publicationName>BioData Mining</prism:publicationName>
        <prism:issn>1756-0381</prism:issn>
        <prism:volume>3</prism:volume>
        <prism:startingPage>3</prism:startingPage>
        <prism:publicationDate>2010-06-04T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biodatamining.org/content/3/1/2">
        <title>Large scale analysis of positional effects of single-base mismatches on microarray gene expression data</title>
        <description>Background:
Affymetrix GeneChips utilize 25-mer oligonucleotides probes linked to a silica surface to detect targets in solution. Mismatches due to single nucleotide polymorphisms (SNPs) can affect the hybridization between probes and targets. Previous research has indicated that binding between probes and targets strongly depends on the positions of these mismatches. However, there has been substantial variability in the effect of mismatch type across studies.
Methods:
By taking advantage of naturally occurring mismatches between rhesus macaque transcripts and human probes from the Affymetrix U133 Plus 2 GeneChip, we collected the largest 25-mer probes dataset with single-base mismatches at each of the 25 positions on the probe ever used in this type of analysis.
Results:
A mismatch at the center of a probe led to a greater loss in signal intensity than a mismatch at the ends of the probe, regardless of the mismatch type. There was a slight asymmetry between the ends of a probe: effects of mismatches at the 3&apos; end of a probe were greater than those at the 5&apos; end. A cross study comparison of the effect of mismatch types revealed that results were not in good agreement among different reports. However, if the mismatch types were consolidated to purine or pyrimidine mismatches, cross study conclusions could be generated.
Conclusion:
The comprehensive assessment of the effects of single-base mismatches on microarrays provided in this report can be useful for improving future versions of microarray platform design and the corresponding data analysis algorithms.</description>
        <link>http://www.biodatamining.org/content/3/1/2</link>
                <dc:creator>Fenghai Duan</dc:creator>
                <dc:creator>Mark Pauley</dc:creator>
                <dc:creator>Eliot Spindel</dc:creator>
                <dc:creator>Li Zhang</dc:creator>
                <dc:creator>Robert Norgren</dc:creator>
                <dc:source>BioData Mining 2010, 3:2</dc:source>
        <dc:date>2010-04-29T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1756-0381-3-2</dc:identifier>
        <prism:publicationName>BioData Mining</prism:publicationName>
        <prism:issn>1756-0381</prism:issn>
        <prism:volume>3</prism:volume>
        <prism:startingPage>2</prism:startingPage>
        <prism:publicationDate>2010-04-29T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biodatamining.org/content/3/1/1">
        <title>A reference guide for tree analysis and visualization</title>
        <description>The quantities of data obtained by the new high-throughput technologies, such as microarrays or ChIP-Chip arrays, and the large-scale OMICS-approaches, such as genomics, proteomics and transcriptomics, are becoming vast. Sequencing technologies become cheaper and easier to use and, thus, large-scale evolutionary studies towards the origins of life for all species and their evolution becomes more and more challenging. Databases holding information about how data are related and how they are hierarchically organized expand rapidly. Clustering analysis is becoming more and more difficult to be applied on very large amounts of data since the results of these algorithms cannot be efficiently visualized. Most of the available visualization tools that are able to represent such hierarchies, project data in 2D and are lacking often the necessary user friendliness and interactivity. For example, the current phylogenetic tree visualization tools are not able to display easy to understand large scale trees with more than a few thousand nodes. In this study, we review tools that are currently available for the visualization of biological trees and analysis, mainly developed during the last decade. We describe the uniform and standard computer readable formats to represent tree hierarchies and we comment on the functionality and the limitations of these tools. We also discuss on how these tools can be developed further and should become integrated with various data sources. Here we focus on freely available software that offers to the users various tree-representation methodologies for biological data analysis.</description>
        <link>http://www.biodatamining.org/content/3/1/1</link>
                <dc:creator>Georgios Pavlopoulos</dc:creator>
                <dc:creator>Theodoros Soldatos</dc:creator>
                <dc:creator>Adriano Barbosa-Silva</dc:creator>
                <dc:creator>Reinhard Schneider</dc:creator>
                <dc:source>BioData Mining 2010, 3:1</dc:source>
        <dc:date>2010-02-22T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1756-0381-3-1</dc:identifier>
        <prism:publicationName>BioData Mining</prism:publicationName>
        <prism:issn>1756-0381</prism:issn>
        <prism:volume>3</prism:volume>
        <prism:startingPage>1</prism:startingPage>
        <prism:publicationDate>2010-02-22T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biodatamining.org/content/2/1/9">
        <title>A biclustering algorithm based on a Bicluster Enumeration Tree: application to DNA microarray data

</title>
        <description>Background:
In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of rows coherent with groups of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed.
Methods:
We introduce BiMine, a new enumeration algorithm for biclustering of DNA microarray data. The proposed algorithm is based on three original features. First, BiMine relies on a new evaluation function called Average Spearman&apos;s rho (ASR). Second, BiMine uses a new tree structure, called Bicluster Enumeration Tree (BET), to represent the different biclusters discovered during the enumeration process. Third, to avoid the combinatorial explosion of the search tree, BiMine introduces a parametric rule that allows the enumeration process to cut tree branches that cannot lead to good biclusters.
Results:
The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data. The experimental results show that BiMine competes well with several other biclustering methods. Moreover, we test the biological significance using a gene annotation web-tool to show that our proposed method is able to produce biologically relevant biclusters. The software is available upon request from the authors to academic users.</description>
        <link>http://www.biodatamining.org/content/2/1/9</link>
                <dc:creator>Wassim Ayadi</dc:creator>
                <dc:creator>Mourad Elloumi</dc:creator>
                <dc:creator>Jin-Kao Hao</dc:creator>
                <dc:source>BioData Mining 2009, 2:9</dc:source>
        <dc:date>2009-12-16T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1756-0381-2-9</dc:identifier>
        <prism:publicationName>BioData Mining</prism:publicationName>
        <prism:issn>1756-0381</prism:issn>
        <prism:volume>2</prism:volume>
        <prism:startingPage>9</prism:startingPage>
        <prism:publicationDate>2009-12-16T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biodatamining.org/content/2/1/8">
        <title>3PFDB - A database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach</title>
        <description>Background:
Protein families could be related to each other at broad levels that group them as superfamilies. These relationships are harder to detect at the sequence level due to high evolutionary divergence. Sequence searches are strongly directed and influenced by the best representatives of families that are viewed as starting points. PSSMs are useful approximations and mathematical representations of protein alignments, with wide array of applications in bioinformatics approaches like remote homology detection, protein family analysis, detection of new members and evolutionary modelling. Computational intensive searches have been performed using the neural network based sensitive sequence search method called FASSM to identify the Best Representative PSSMs for families reported in Pfam database version 22.
Results:
We designed a novel data mining approach for the assessment of individual sequences from a protein family to identify a single Best Representative PSSM profile (BRP) per protein family. Using the approach, a database of protein family-specific best representative PSSM profiles called 3PFDB has been developed. PSSM profiles in 3PFDB are curated using performance of individual sequence as a reference in a rigorous scoring and coverage analysis approach using FASSM. We have assessed the suitability of 10, 85,588 sequences derived from seed or full alignments reported in Pfam database (Version 22). Coverage analysis using FASSM method is used as the filtering step to identify the best representative sequence, starting from full length or domain sequences to generate the final profile for a given family. 3PFDB is a collection of best representative PSSM profiles of 8,524 protein families from Pfam database.
Conclusion:
Availability of an approach to identify BRPs and a curated database of best representative PSI-BLAST derived PSSMs for 91.4% of current Pfam family will be a useful resource for the community to perform detailed and specific analysis using family-specific, best-representative PSSM profiles. 3PFDB can be accessed using the URL: http://caps.ncbs.res.in/3pfdb</description>
        <link>http://www.biodatamining.org/content/2/1/8</link>
                <dc:creator>Khader Shameer</dc:creator>
                <dc:creator>Paramasivam Nagarajan</dc:creator>
                <dc:creator>Kumar Gaurav</dc:creator>
                <dc:creator>Ramanathan Sowdhamini</dc:creator>
                <dc:source>BioData Mining 2009, 2:8</dc:source>
        <dc:date>2009-12-04T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1756-0381-2-8</dc:identifier>
        <prism:publicationName>BioData Mining</prism:publicationName>
        <prism:issn>1756-0381</prism:issn>
        <prism:volume>2</prism:volume>
        <prism:startingPage>8</prism:startingPage>
        <prism:publicationDate>2009-12-04T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biodatamining.org/content/2/1/7">
        <title>LD-Spline:  Mapping SNPs on genotyping platforms to genomic regions using patterns of linkage disequilibrium</title>
        <description>Background:
Gene-centric analysis tools for genome-wide association study data are being developed both to annotate single locus statistics and to prioritize or group single nucleotide polymorphisms (SNPs) prior to analysis. These approaches require knowledge about the relationships between SNPs on a genotyping platform and genes in the human genome. SNPs in the genome can represent broader genomic regions via linkage disequilibrium (LD), and population-specific patterns of LD can be exploited to generate a data-driven map of SNPs to genes.
Methods:
In this study, we implemented LD-Spline, a database routine that defines the genomic boundaries a particular SNP represents using linkage disequilibrium statistics from the International HapMap Project. We compared the LD-Spline haplotype block partitioning approach to that of the four gamete rule and the Gabriel et al. approach using simulated data; in addition, we processed two commonly used genome-wide association study platforms.
Results:
We illustrate that LD-Spline performs comparably to the four-gamete rule and the Gabriel et al. approach; however as a SNP-centric approach LD-Spline has the added benefit of systematically identifying a genomic boundary for each SNP, where the global block partitioning approaches may falter due to sampling variation in LD statistics.
Conclusion:
LD-Spline is an integrated database routine that quickly and effectively defines the genomic region marked by a SNP using linkage disequilibrium, with a SNP-centric block definition algorithm.</description>
        <link>http://www.biodatamining.org/content/2/1/7</link>
                <dc:creator>William Bush</dc:creator>
                <dc:creator>Guanhua Chen</dc:creator>
                <dc:creator>Eric Torstenson</dc:creator>
                <dc:creator>Marylyn Ritchie</dc:creator>
                <dc:source>BioData Mining 2009, 2:7</dc:source>
        <dc:date>2009-12-03T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1756-0381-2-7</dc:identifier>
        <prism:publicationName>BioData Mining</prism:publicationName>
        <prism:issn>1756-0381</prism:issn>
        <prism:volume>2</prism:volume>
        <prism:startingPage>7</prism:startingPage>
        <prism:publicationDate>2009-12-03T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biodatamining.org/content/2/1/6">
        <title>Extraction of pure components from overlapped signals in gas chromatography-mass spectrometry (GC-MS)</title>
        <description>Gas chromatography-mass spectrometry (GC-MS) is a widely used analytical technique for the identification and quantification of trace chemicals in complex mixtures. When complex samples are analyzed by GC-MS it is common to observe co-elution of two or more components, resulting in an overlap of signal peaks observed in the total ion chromatogram. In such situations manual signal analysis is often the most reliable means for the extraction of pure component signals; however, a systematic manual analysis over a number of samples is both tedious and prone to error. In the past 30 years a number of computational approaches were proposed to assist in the process of the extraction of pure signals from co-eluting GC-MS components. This includes empirical methods, comparison with library spectra, eigenvalue analysis, regression and others. However, to date no approach has been recognized as best, nor accepted as standard. This situation hampers general GC-MS capabilities, and in particular has implications for the development of robust, high-throughput GC-MS analytical protocols required in metabolic profiling and biomarker discovery. Here we first discuss the nature of GC-MS data, and then review some of the approaches proposed for the extraction of pure signals from co-eluting components. We summarize and classify different approaches to this problem, and examine why so many approaches proposed in the past have failed to live up to their full promise. Finally, we give some thoughts on the future developments in this field, and suggest that the progress in general computing capabilities attained in the past two decades has opened new horizons for tackling this important problem.</description>
        <link>http://www.biodatamining.org/content/2/1/6</link>
                <dc:creator>Vladimir Likic</dc:creator>
                <dc:source>BioData Mining 2009, 2:6</dc:source>
        <dc:date>2009-10-12T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1756-0381-2-6</dc:identifier>
        <prism:publicationName>BioData Mining</prism:publicationName>
        <prism:issn>1756-0381</prism:issn>
        <prism:volume>2</prism:volume>
        <prism:startingPage>6</prism:startingPage>
        <prism:publicationDate>2009-10-12T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biodatamining.org/content/2/1/5">
        <title>Spatially Uniform ReliefF (SURF) for Computationally-Efficient Filtering of Gene-Gene Interactions</title>
        <description>Background:
Genome-wide association studies are becoming the de facto standard in the genetic analysis of common human diseases. Given the complexity and robustness of biological networks such diseases are unlikely to be the result of single points of failure but instead likely arise from the joint failure of two or more interacting components. The hope in genome-wide screens is that these points of failure can be linked to single nucleotide polymorphisms (SNPs) which confer disease susceptibility. Detecting interacting variants that lead to disease in the absence of single-gene effects is difficult however, and methods to exhaustively analyze sets of these variants for interactions are combinatorial in nature thus making them computationally infeasible. Efficient algorithms which can detect interacting SNPs are needed. ReliefF is one such promising algorithm, although it has low success rate for noisy datasets when the interaction effect is small. ReliefF has been paired with an iterative approach, Tuned ReliefF (TuRF), which improves the estimation of weights in noisy data but does not fundamentally change the underlying ReliefF algorithm. To improve the sensitivity of studies using these methods to detect small effects we introduce Spatially Uniform ReliefF (SURF).
Results:
SURF&apos;s ability to detect interactions in this domain is significantly greater than that of ReliefF. Similarly SURF, in combination with the TuRF strategy significantly outperforms TuRF alone for SNP selection under an epistasis model. It is important to note that this success rate increase does not require an increase in algorithmic complexity and allows for increased success rate, even with the removal of a nuisance parameter from the algorithm.
Conclusion:
Researchers performing genetic association studies and aiming to discover gene-gene interactions associated with increased disease susceptibility should use SURF in place of ReliefF. For instance, SURF should be used instead of ReliefF to filter a dataset before an exhaustive MDR analysis. This change increases the ability of a study to detect gene-gene interactions. The SURF algorithm is implemented in the open source Multifactor Dimensionality Reduction (MDR) software package available from http://www.epistasis.org.</description>
        <link>http://www.biodatamining.org/content/2/1/5</link>
                <dc:creator>Casey Greene</dc:creator>
                <dc:creator>Nadia Penrod</dc:creator>
                <dc:creator>Jeff Kiralis</dc:creator>
                <dc:creator>Jason Moore</dc:creator>
                <dc:source>BioData Mining 2009, 2:5</dc:source>
        <dc:date>2009-09-22T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1756-0381-2-5</dc:identifier>
        <prism:publicationName>BioData Mining</prism:publicationName>
        <prism:issn>1756-0381</prism:issn>
        <prism:volume>2</prism:volume>
        <prism:startingPage>5</prism:startingPage>
        <prism:publicationDate>2009-09-22T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biodatamining.org/content/2/1/4">
        <title>Statistical Quality Assessment and Outlier Detection for Liquid Chromatography-Mass Spectrometry Experiments</title>
        <description>Background:
Quality assessment methods, that are common place in engineering and industrial production, are not widely spread in large-scale proteomics experiments. But modern technologies such as Multi-Dimensional Liquid Chromatography coupled to Mass Spectrometry (LC-MS) produce large quantities of proteomic data. These data are prone to measurement errors and reproducibility problems such that an automatic quality assessment and control become increasingly important.
Results:
We propose a methodology to assess the quality and reproducibility of data generated in quantitative LC-MS experiments. We introduce quality descriptors that capture different aspects of the quality and reproducibility of LC-MS data sets. Our method is based on the Mahalanobis distance and a robust Principal Component Analysis.
Conclusion:
We evaluate our approach on several data sets of different complexities and show that we are able to precisely detect LC-MS runs of poor signal quality in large-scale studies.</description>
        <link>http://www.biodatamining.org/content/2/1/4</link>
                <dc:creator>Ole Schulz-Trieglaff</dc:creator>
                <dc:creator>Egidijus Machtejevas</dc:creator>
                <dc:creator>Knut Reinert</dc:creator>
                <dc:creator>Hartmut Schlueter</dc:creator>
                <dc:creator>Joachim Thiemann</dc:creator>
                <dc:creator>Klaus Unger</dc:creator>
                <dc:source>BioData Mining 2009, 2:4</dc:source>
        <dc:date>2009-04-07T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1756-0381-2-4</dc:identifier>
        <prism:publicationName>BioData Mining</prism:publicationName>
        <prism:issn>1756-0381</prism:issn>
        <prism:volume>2</prism:volume>
        <prism:startingPage>4</prism:startingPage>
        <prism:publicationDate>2009-04-07T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <cc:License rdf:about="http://creativecommons.org/licenses/by/2.0/">
        <cc:permits rdf:resource="http://creativecommons.org/ns#Reproduction" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#Distribution" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks" />
    </cc:License>
</rdf:RDF>
