6.5. Filip results

6.5.1. CAFA 2

6.5.1.1. \(F_{max}\) Improvement

Table 6.1 CAFA2 data f-max results for DcGO and filip

\(F_{max}\) DcGO

\(F_{max}\) DcGO + Filip

0.408

0.409

During development, I validated Filip using the original DcGO CAFA2 submission, using the CAFA2 targets. The \(F_{max}\) score was calculated for human BPO, combining both No Knowledge and Limited Knowledge targets. Table 6.1 shows that Filip provides a small benefit to the \(F_{max}\) score.

6.5.1.2. Bootstrapping

../_images/filip_bootstrap.png

Fig. 6.5 This bootstrapping histogram shows the distribution of the number of correct predictions found when deleting a random selection of DcGO predictions (the same number as are discarded by the filter). The dotted lines shows the number of correct predictions found by the filter. The low p-value (\(9.99 \times 10^{-5}\)) shows the low probability of the filter performing at least as well as it has (in terms of the number of correct predictions) by random chance.

The small improvement is due to Filip filtering out 85,637 GOBP human protein predictions, only 23 of which were true according to the CAFA2 ground truth, meaning that 99.973% of filip’s predictions (on what to filter in or out) were correct.

To ensure that this is a better success rate than we would expect by chance, I performed a bootstrapping test by taking out random sets of 85,637 predictions from the DcGO set and measuring the number of true positives remaining in the set. This was repeated 100,000 times to create Fig. 6.5, and calculate the p-value \(p < 0.001\), meaning that the filter performed far better than chance.

6.5.1.3. Relationship between incorrect terms

../_images/revigo_filip_wrong_cafa2.png

Fig. 6.6 A grouping/summary of the incorrectly excluded predictions. Larger circles represent terms which are parents to more of the input terms. Lines represent the relationships between pictured terms.

I also looked to see whether there was any relationship between the 23 incorrectly removed predictions. Interestingly, all of these incorrectly filtered out predictions were for GOBP terms which were related to development (e.g. tissue development, anatomical structure development, epithelium development, organ morphogenesis). Fig. 6.6 shows relationships between the incorrectly mapped GO terms, created using ReviGO[220]. The fact that the incorrectly filtered out terms are all related to development may be due to a lack of tissue-specific developing tissues in the FANTOM5 data-set used by Filip.

6.5.2. CAFA 3

The same kind of improvement is seen by the independently calculated CAFA3 results (validated by the CAFA3 team). I entered two models into CAFA3: DcGO only and DcGO plus Filip, for Human and for Gene Ontology Biological Process terms only.

In all categories, Filip improved DcGO by 0.02 \(F_{max}\) (see Table 6.2). This was not enough to be a competitive model (ranked between 33 and 38 out of 67 for this category). Despite this, this result does show that the improvement was reproduced in another data set, carried out by other researchers.

Table 6.2 CAFA3 f-max results for DcGO and filip

Type

Mode

\(F_{max}\) DcGO

\(F_{max}\) DcGO + Filip

No Knowledge

Partial Assessment

0.326

0.328

No Knowledge

Full Assessment

0.326

0.328

Limited Knowledge

Partial Assessment

0.503

0.505

Limited Knowledge

Full Assessment

0.503

0.505