BioData Mining


Open Access Highly Access Research

Partitioning clustering algorithms for protein sequence data sets

Sondes Fayech*, Nadia Essoussi and Mohamed Limam

Author Affiliations

Department of Computer Science, LARODEC Laboratory, Higher Institute of Management, University of Tunis, Tunis, Tunisia

For all author emails, please log on.

BioData Mining 2009, 2:3 doi:10.1186/1756-0381-2-3

Published: 2 April 2009

Additional files

Additional File 1:

Training and test data sets. The file contains text files which correspond to the used data set in this study in fasta format. The file contains two directories: the training base which has 3500 sequences and the test base which has 1422 sequences. The considered data set, named DS4, has a total of 4922 sequences out of which 3500 sequences (practically 70% of the dataset DS4) are randomly selected for training, and 1422 for testing (practically 30% of the dataset DS4). This dataset contains proteins selected from HLA (DS1), Hydrolases (DS2) and Globins (DS3) protein families.

Format: ZIP Size: 1.7MB Download file

Open Data