Partitioning clustering algorithms for protein sequence data sets
-
* Corresponding author: Sondes Fayech sondes_el_feyech@yahoo.fr
Department of Computer Science, LARODEC Laboratory, Higher Institute of Management, University of Tunis, Tunis, Tunisia
BioData Mining 2009, 2:3 doi:10.1186/1756-0381-2-3
Published: 2 April 2009Additional files
Additional File 1:
Training and test data sets. The file contains text files which correspond to the used data set in this study in fasta format. The file contains two directories: the training base which has 3500 sequences and the test base which has 1422 sequences. The considered data set, named DS4, has a total of 4922 sequences out of which 3500 sequences (practically 70% of the dataset DS4) are randomly selected for training, and 1422 for testing (practically 30% of the dataset DS4). This dataset contains proteins selected from HLA (DS1), Hydrolases (DS2) and Globins (DS3) protein families.
Format: ZIP Size: 1.7MB Download file
