2.6. Summary: how genotype and phenotype are linked

The purpose of this introductory chapter was to provide an overview of how we think phenotype arises from genotype. It’s also to explain why it’s a hard problem!

As we’ve seen in this chapter, there are many kinds of genetic variation which can influence phenotype. I will summarise the link between genotype and phenotype for the simplest and smallest kind of genetic variation: the SNP.

SNPs can exist anywhere on the genome: in the exome, or outside of it. If the SNP is in a coding region, it may encode for multiple different proteins, and for each of them, it could change the structure of the protein at one location, cut it short at that point, or have no effect on the structure. If the SNP is non-synonymous for the protein (affects protein structure), then it may fall in a disordered region of a protein (leaving us without structural - and therefore often functional information), and we may not know in what circumstances and cells that protein is transcribed. In addition, the SNP may affect phenotype differently with homozygous or heterozygous calls, and the protein may affect phenotype by influencing a network of other proteins, or the protein may exist as a redundant part of a pathway which will only affect phenotype if three other SNPs have specific calls. Even after all this, the presentation of many phenotypes can depend heavily on the environment, or the age of the individual. The mechanisms will be different for each phenotype, and we can expect some phenotypes to be impossible to predict from genotype.

Given all this complexity, it may seem no wonder that phenotype prediction remains a challenge[5]. In the next chapter, I describe the diverse information about biological entities that can be measured, including gene and protein sequence, protein structure, variant frequencies and functions, and gene expression, and how it is currently used.