Applications and methods utilizing the Simple Semantic Web Architecture and Protocol (SSWAP) for bioinformatics resource discovery and disparate data and service integration
- Equal contributors
1 USDA-ARS, CICGR, 100 Osborne Dr. Rm. 1575, Ames, IA, 50011-1010 USA
2 Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
3 National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA
4 USDA-ARS, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
5 University of Arizona, 1657 E. Helen St., Tucson, AZ 85721, USA
BioData Mining 2010, 3:3 doi:10.1186/1756-0381-3-3Published: 4 June 2010
Scientific data integration and computational service discovery are challenges for the bioinformatic community. This process is made more difficult by the separate and independent construction of biological databases, which makes the exchange of data between information resources difficult and labor intensive. A recently described semantic web protocol, the Simple Semantic Web Architecture and Protocol (SSWAP; pronounced "swap") offers the ability to describe data and services in a semantically meaningful way. We report how three major information resources (Gramene, SoyBase and the Legume Information System [LIS]) used SSWAP to semantically describe selected data and web services.
We selected high-priority Quantitative Trait Locus (QTL), genomic mapping, trait, phenotypic, and sequence data and associated services such as BLAST for publication, data retrieval, and service invocation via semantic web services. Data and services were mapped to concepts and categories as implemented in legacy and de novo community ontologies. We used SSWAP to express these offerings in OWL Web Ontology Language (OWL), Resource Description Framework (RDF) and eXtensible Markup Language (XML) documents, which are appropriate for their semantic discovery and retrieval. We implemented SSWAP services to respond to web queries and return data. These services are registered with the SSWAP Discovery Server and are available for semantic discovery at http://sswap.info webcite.
A total of ten services delivering QTL information from Gramene were created. From SoyBase, we created six services delivering information about soybean QTLs, and seven services delivering genetic locus information. For LIS we constructed three services, two of which allow the retrieval of DNA and RNA FASTA sequences with the third service providing nucleic acid sequence comparison capability (BLAST).
The need for semantic integration technologies has preceded available solutions. We report the feasibility of mapping high priority data from local, independent, idiosyncratic data schemas to common shared concepts as implemented in web-accessible ontologies. These mappings are then amenable for use in semantic web services. Our implementation of approximately two dozen services means that biological data at three large information resources (Gramene, SoyBase, and LIS) is available for programmatic access, semantic searching, and enhanced interaction between the separate missions of these resources.