Open Access Highly Accessed Open Badges Software article

Caipirini: using gene sets to rank literature

Theodoros G Soldatos16, Seán I O'Donoghue178*, Venkata P Satagopam1, Adriano Barbosa-Silva123, Georgios A Pavlopoulos15, Ana Carolina Wanderley-Nogueira4, Nina Mota Soares-Cavalcanti4 and Reinhard Schneider19

Author Affiliations

1 Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany

2 Computational Biology and Data Mining Group, Max-Delbrück Center for Molecular Medicine, Berlin, Germany

3 Bioinformatics Graduate Program, Federal University of Paraná - UFPR (SEPT). Curitiba - PR, Brazil

4 Departamento de Genética, Laboratório de Genética e Biotecnologia Vegetal, Centro de CiênciasBiológicas, Universidade Federal de Pernambuco, Recife, PE, Brasil

5 ESAT-SCD/IBBT-K.U. Leuven Future Health Department, KatholiekeUniversiteit Leuven, Leuven, Belgium

6 LIFE Biosystems GmbH, Heidelberg, Germany

7 Garvan Institute of Medical Research, Sydney, Australia

8 Division of Mathematics, Informatics, and Statistics, CSIRO, Sydney, Australia

9 Luxembourg Center for Systems Biomedicine, University of Luxembourg, Luxembourg

For all author emails, please log on.

BioData Mining 2012, 5:1  doi:10.1186/1756-0381-5-1

Published: 1 February 2012



Keeping up-to-date with bioscience literature is becoming increasingly challenging. Several recent methods help meet this challenge by allowing literature search to be launched based on lists of abstracts that the user judges to be 'interesting'. Some methods go further by allowing the user to provide a second input set of 'uninteresting' abstracts; these two input sets are then used to search and rank literature by relevance. In this work we present the service 'Caipirini' ( webcite) that also allows two input sets, but takes the novel approach of allowing ranking of literature based on one or more sets of genes.


To evaluate the usefulness of Caipirini, we used two test cases, one related to the human cell cycle, and a second related to disease defense mechanisms in Arabidopsis thaliana. In both cases, the new method achieved high precision in finding literature related to the biological mechanisms underlying the input data sets.


To our knowledge Caipirini is the first service enabling literature search directly based on biological relevance to gene sets; thus, Caipirini gives the research community a new way to unlock hidden knowledge from gene sets derived via high-throughput experiments.