Table 3

Precision analysis of the guilt-by-association algorithm




Threshold for number of connections (TC)




Threshold for % disease genes among interactors (TD)


Q1
TC = 1
Q2
TC = 4
Q3
TC = 13
Mean
TC = 12

Q1
TD = 12.8
N Captured
1,943
1,391
638
683


% Known
73.3
75.0
76.5
76.4

Q2
TD = 28.6
N Captured
1,024
563
195
219


% Known
74.8
78.9
85.1
84.9

Q3
TD = 50.0
N Captured
251
118
16
19


% Known
70.5
67.8
75.0
78.9

Mean
TD = 35.0
N Captured
748
409
109
127


% Known
73.4
76.3
84.4
84.2

The optimality of various location parameters to be used as thresholds in the guilt-by-association algorithm was explored by computing the proportion of known (% Known) disease associated genes from the total number of captured genes (N Captured). The analysis was performed using only the 1,445 genes (out of the initial 6,151) with known disease phenotype as the set of truly disease causing, and with the remaining 4,706 declared as disease associated. The three inter-quartiles (Q1: 25th percentile; Q2: 50th percentile or median; and Q3: 75th percentile) plus the mean were used as thresholds.

Reverter et al. BioData Mining 2008 1:8   doi:10.1186/1756-0381-1-8