9. Concluding remarks

Working on Snowflake gave me a bird’s eye view of our model of the connection between genotype and phenotype: and the data sets we have about that connection. It is (obviously) regrettable that we could not conclusively test it as a phenotype predictor across phenotypes. I think that the work I’ve done in developing Snowflake as a tool for outlier detection for unusual combinations of variants could still prove useful in the future, but we would first need to access a data set with many phenotypes. However, in Filip, I have found a small way in which to improve phenotype predictions across the genome, with a mechanistic reason behind it, and I hope to continue to improve this.

In my attempts to make explanative genome-wide predictions about protein function, I continuously bumped up against the limits of what is possible with the data that we currently have. These resources are absolutely vital to the efforts of computational biology, and are amazing feats of research, engineering, and collaboration, but there are some limits at present in using them for “big-picture” biology. As such, some of the most satisfying work has been to contribute back to some of these resources. Through linking them, and finding inconsistencies, I have in some small way been part of science’s self-correcting mechanism, and hope that this brings us a little closer to their use for genome-wide explanatory predictions.