BioData Mining


Open Access Software article

SICTIN: Rapid footprinting of massively parallel sequencing data

Stefan Enroth1, Robin Andersson1, Claes Wadelius2 and Jan Komorowski3,1*

Author Affiliations

1 Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics, Uppsala University, Box 598, SE-75124 Uppsala, Sweden

2 Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, SE-75185 Uppsala, Sweden

3 Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University, PL-02-106 Warszawa, Poland

For all author emails, please log on.

BioData Mining 2010, 3:4 doi:10.1186/1756-0381-3-4

Published: 13 August 2010

Abstract

Background

Massively parallel sequencing allows for genome-wide hypothesis-free investigation of for instance transcription factor binding sites or histone modifications. Although nucleotide resolution detailed information can easily be generated, biological insight often requires a more general view of patterns (footprints) over distinct genomic features such as transcription start sites, exons or repetitive regions. The construction of these footprints is however a time consuming task.

Methods

The presented software generates a binary representation of the signals enabling fast and scalable lookup. This representation allows for footprint generation in mere minutes on a desktop computer. Several different input formats are accepted, e.g. the SAM format, bed-files and the UCSC wiggle track.

Conclusions

Hypothesis-free investigation of genome wide interactions allows for biological data mining at a scale never before seen. Until recently, the main focus of analysis of sequencing data has been targeted on signal patterns around transcriptional start sites which are in manageable numbers. Today, focus is shifting to a wider perspective and numerous genomic features are being studied. To this end, we provide a system allowing for fast querying in the order of hundreds of thousands of features.