Bibliography

References are shown at the bottom of the page on which they are cited. A full bibliography (of the entire Jupyter Book) appears below:


1

Executable Books Community. Jupyter book. February 2020. URL: https://zenodo.org/record/4539666.

2

Jan Zaucha, Jonathan Stahlhacke, Matt E Oates, Natalie Thurlby, Owen J L Rackham, Hai Fang, Ben Smithers, and Julian Gough. A proteome quality index. Environ. Microbiol., 17(1):4–9, 2015. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/1462-2920.12622.

3

Matt E Oates, Jonathan Stahlhacke, Dimitrios V Vavoulis, Ben Smithers, Owen J L Rackham, Adam J Sardar, Jan Zaucha, Natalie Thurlby, Hai Fang, and Julian Gough. The SUPERFAMILY 1.75 database in 2014: a doubling of data. Nucleic Acids Res., 43(Database issue):D227–33, January 2015.

4

Julian Gough, Jan Zaucha, and Natalie Thurlby. Determining phenotype from genotype. July 2017. URL: https://patents.google.com/patent/US20200176085A1/en.

5

Naihui Zhou, Yuxiang Jiang, Timothy R Bergquist, Alexandra J Lee, Balint Z Kacsoh, Alex W Crocker, Kimberley A Lewis, George Georghiou, Huy N Nguyen, Md Nafiz Hamid, Larry Davis, Tunca Dogan, Volkan Atalay, Ahmet S Rifaioglu, Alperen Dalkiran, Rengul Cetin-Atalay, Chengxin Zhang, Rebecca L Hurto, Peter L Freddolino, Yang Zhang, Prajwal Bhat, Fran Supek, José M Fernández, Branislava Gemovic, Vladimir R Perovic, Radoslav S Davidović, Neven Sumonja, Nevena Veljkovic, Ehsaneddin Asgari, Mohammad R K Mofrad, Giuseppe Profiti, Castrense Savojardo, Pier Luigi Martelli, Rita Casadio, Florian Boecker, Indika Kahanda, Natalie Thurlby, Alice C McHardy, Alexandre Renaux, Rabie Saidi, Julian Gough, Alex A Freitas, Magdalena Antczak, Fabio Fabris, Mark N Wass, Jie Hou, Jianlin Cheng, Zheng Wang, Alfonso E Romero, Alberto Paccanaro, Haixuan Yang, Tatyana Goldberg, Chenguang Zhao, Liisa Holm, Petri Törönen, Alan J Medlar, Elaine Zosa, Itamar Borukhov, Ilya Novikov, Angela Wilkins, Olivier Lichtarge, Po-Han Chi, Wei-Cheng Tseng, Michal Linial, Peter W Rose, Christophe Dessimoz, Vedrana Vidulin, Saso Dzeroski, Ian Sillitoe, Sayoni Das, Jonathan Gill Lees, David T Jones, Cen Wan, Domenico Cozzetto, Rui Fa, Mateo Torres, Alex Wiarwick Vesztrocy, Jose Manuel Rodriguez, Michael L Tress, Marco Frasca, Marco Notaro, Giuliano Grossi, Alessandro Petrini, Matteo Re, Giorgio Valentini, Marco Mesiti, Daniel B Roche, Jonas Reeb, David W Ritchie, Sabeur Aridhi, Seyed Ziaeddin Alborzi, Marie-Dominique Devignes, Koo Da Chen Emily, Richard Bonneau, Vladimir Gligorijević, Meet Barot, Hai Fang, Stefano Toppo, Enrico Lavezzo, Marco Falda, Michele Berselli, Silvio C E Tosatto, Marco Carraro, Damiano Piovesan, Hafeez Ur Rehman, Qizhong Mao, Shanshan Zhang, Slobodan Vucetic, Gage S Black, Dane Jo, Dallas J Larsen, Ashton R Omdahl, Luke W Sagers, Erica Suh, Jonathan B Dayton, Liam J McGuffin, Danielle A Brackenridge, Patricia C Babbitt, Jeffrey M Yunes, Paolo Fontana, Feng Zhang, Shanfeng Zhu, Ronghui You, Zihan Zhang, Suyang Dai, Shuwei Yao, Weidong Tian, Renzhi Cao, Caleb Chandler, Miguel Amezola, Devon Johnson, Jia-Ming Chang, Wen-Hung Liao, Yi-Wei Liu, Stefano Pascarelli, Yotam Frank, Robert Hoehndorf, Maxat Kulmanov, Imane Boudellioua, Gianfranco Politano, Stefano Di Carlo, Alfredo Benso, Kai Hakala, Filip Ginter, Farrokh Mehryary, Suwisa Kaewphan, Jari Björne, Hans Moen, Martti E E Tolvanen, Tapio Salakoski, Daisuke Kihara, Aashish Jain, Tomislav Šmuc, Adrian Altenhoff, Asa Ben-Hur, Burkhard Rost, Steven E Brenner, Christine A Orengo, Constance J Jeffery, Giovanni Bosco, Deborah A Hogan, Maria J Martin, Claire O'Donovan, Sean D Mooney, Casey S Greene, Predrag Radivojac, and Iddo Friedberg. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. bioRxiv, pages 653105, May 2019.

6

Welcome to the krita 4.4 manual! — krita manual 4.4.0 documentation. https://docs.krita.org/en/index.html. Accessed: 2021-2-22. URL: https://docs.krita.org/en/index.html.

7

Eric Bonabeau. Agent-based modeling: methods and techniques for simulating human systems. Proc. Natl. Acad. Sci. U. S. A., 99 Suppl 3:7280–7287, May 2002. URL: http://dx.doi.org/10.1073/pnas.082080899.

8

A M Turing. The chemical basis of morphogenesis. 1953. Bull. Math. Biol., 52(1-2):153–97; discussion 119–52, 1990. URL: http://dx.doi.org/10.1007/BF02459572.

9

Kristie Whitaker and Olivia Guest. # bropenscience is broken science: kirstie whitaker and olivia guest ask how open `open science'really is. Psychologist, 33:34–37, 2020. URL: https://pure.mpg.de/rest/items/item_3286863/component/file_3286864/content.

10

Steven Rose. Darwin, race and gender. EMBO Rep., 10(4):297–298, April 2009. URL: http://dx.doi.org/10.1038/embor.2009.40.

11

Conway Zirkle. The inheritance of acquired characters and the provisional hypothesis of pangenesis. Am. Nat., 69(724):417–445, September 1935.

12

Anthony Grafton and Nancy G Siraisi. Natural particulars: nature and the disciplines in renaissance europe. In acls humanities e-book. MPublishing, University of Michigan Library, 1999.

13

Charles Darwin. On the Origin of Species by Means of Natural Selection Or the Preservation of Favoured Races in the Struggle for Life. International Book Company, 1913. URL: http://lnmcp.mf.uni-lj.si/ISN/Darwin_C.doc.

14

Daniel J Fairbanks. Mendel and darwin: untangling a persistent enigma. Heredity, 124(2):263–273, February 2020. URL: http://dx.doi.org/10.1038/s41437-019-0289-9.

15

Gregor Mendel. Experiments in plant hybridization (1865). Verhandlungen des naturforschenden Vereins Brünn) Available online, 1996. URL: http://old.esp.org/foundations/genetics/classical/gm-65-a.pdf.

16

W Johannsen. The genotype conception of heredity. Am. Nat., 45(531):129–159, March 1911. URL: https://doi.org/10.1086/279202.

17

Richard J Evans. RA fisher and the science of hatred. New Stateman, June 2020. URL: https://www.newstatesman.com/international/science-tech/2020/07/ra-fisher-and-science-hatred.

18

J D Watson and F H Crick. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature, 171(4356):737–738, April 1953. URL: http://dx.doi.org/10.1038/171737a0.

19

J C Venter, M D Adams, E W Myers, P W Li, R J Mural, G G Sutton, H O Smith, M Yandell, C A Evans, R A Holt, J D Gocayne, P Amanatides, R M Ballew, D H Huson, J R Wortman, Q Zhang, C D Kodira, X H Zheng, L Chen, M Skupski, G Subramanian, P D Thomas, J Zhang, G L Gabor Miklos, C Nelson, S Broder, A G Clark, J Nadeau, V A McKusick, N Zinder, A J Levine, R J Roberts, M Simon, C Slayman, M Hunkapiller, R Bolanos, A Delcher, I Dew, D Fasulo, M Flanigan, L Florea, A Halpern, S Hannenhalli, S Kravitz, S Levy, C Mobarry, K Reinert, K Remington, J Abu-Threideh, E Beasley, K Biddick, V Bonazzi, R Brandon, M Cargill, I Chandramouliswaran, R Charlab, K Chaturvedi, Z Deng, V Di Francesco, P Dunn, K Eilbeck, C Evangelista, A E Gabrielian, W Gan, W Ge, F Gong, Z Gu, P Guan, T J Heiman, M E Higgins, R R Ji, Z Ke, K A Ketchum, Z Lai, Y Lei, Z Li, J Li, Y Liang, X Lin, F Lu, G V Merkulov, N Milshina, H M Moore, A K Naik, V A Narayan, B Neelam, D Nusskern, D B Rusch, S Salzberg, W Shao, B Shue, J Sun, Z Wang, A Wang, X Wang, J Wang, M Wei, R Wides, C Xiao, C Yan, A Yao, J Ye, M Zhan, W Zhang, H Zhang, Q Zhao, L Zheng, F Zhong, W Zhong, S Zhu, S Zhao, D Gilbert, S Baumhueter, G Spier, C Carter, A Cravchik, T Woodage, F Ali, H An, A Awe, D Baldwin, H Baden, M Barnstead, I Barrow, K Beeson, D Busam, A Carver, A Center, M L Cheng, L Curry, S Danaher, L Davenport, R Desilets, S Dietz, K Dodson, L Doup, S Ferriera, N Garg, A Gluecksmann, B Hart, J Haynes, C Haynes, C Heiner, S Hladun, D Hostin, J Houck, T Howland, C Ibegwam, J Johnson, F Kalush, L Kline, S Koduru, A Love, F Mann, D May, S McCawley, T McIntosh, I McMullen, M Moy, L Moy, B Murphy, K Nelson, C Pfannkoch, E Pratts, V Puri, H Qureshi, M Reardon, R Rodriguez, Y H Rogers, D Romblad, B Ruhfel, R Scott, C Sitter, M Smallwood, E Stewart, R Strong, E Suh, R Thomas, N N Tint, S Tse, C Vech, G Wang, J Wetter, S Williams, M Williams, S Windsor, E Winn-Deen, K Wolfe, J Zaveri, K Zaveri, J F Abril, R Guigó, M J Campbell, K V Sjolander, B Karlak, A Kejariwal, H Mi, B Lazareva, T Hatton, A Narechania, K Diemer, A Muruganujan, N Guo, S Sato, V Bafna, S Istrail, R Lippert, R Schwartz, B Walenz, S Yooseph, D Allen, A Basu, J Baxendale, L Blick, M Caminha, J Carnes-Stine, P Caulk, Y H Chiang, M Coyne, C Dahlke, A Mays, M Dombroski, M Donnelly, D Ely, S Esparham, C Fosler, H Gire, S Glanowski, K Glasser, A Glodek, M Gorokhov, K Graham, B Gropman, M Harris, J Heil, S Henderson, J Hoover, D Jennings, C Jordan, J Jordan, J Kasha, L Kagan, C Kraft, A Levitsky, M Lewis, X Liu, J Lopez, D Ma, W Majoros, J McDaniel, S Murphy, M Newman, T Nguyen, N Nguyen, M Nodell, S Pan, J Peck, M Peterson, W Rowe, R Sanders, J Scott, M Simpson, T Smith, A Sprague, T Stockwell, R Turner, E Venter, M Wang, M Wen, D Wu, M Wu, A Xia, A Zandieh, and X Zhu. The sequence of the human genome. Science, 291(5507):1304–1351, February 2001.

20

James Meek. Decoding DNA. The Guardian, June 2000. URL: http://www.theguardian.com/science/2000/jun/27/genetics.uknews.

21

Amy Harmon. James watson had a chance to salvage his reputation on race. he made things worse. The New York Times, January 2019. URL: https://www.nytimes.com/2019/01/01/science/watson-dna-genetics-race.html.

22

Timothé Cynober. Why are there only 11 cell and gene therapies in europe? Labiotech, September 2020. URL: https://www.labiotech.eu/in-depth/atmp-cell-gene-therapy-ema/.

23

Guardian staff reporter. Cancer patients in england to be offered chance to avoid toxic side-effects. The Guardian, December 2020. URL: http://www.theguardian.com/society/2020/dec/28/cancer-patients-in-england-to-be-offered-chance-to-avoid-toxic-side-effects.

24

Gallery 19: DNA model, 1953 :: DNA learning center. https://www.dnalc.org/view/16430-Gallery-19-DNA-model-1953.html. Accessed: 2019-6-2.

25

G J Mulder. Ueber die zusammensetzung einiger thierischen substanzen. J. Prakt. Chem., 1839.

26

Christian B Anfinsen, Edgar Haber, Michael Sela, and FH White Jr. The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proceedings of the National Academy of Sciences of the United States of America, 47(9):1309, 1961.

27

Kathryn Tunyasuvunakool, Jonas Adler, Zachary Wu, Tim Green, Michal Zielinski, Augustin Žídek, Alex Bridgland, Andrew Cowie, Clemens Meyer, Agata Laydon, and others. Highly accurate protein structure prediction for the human proteome. Nature, 596(7873):590–596, 2021.

28

V N Uversky. Posttranslational modification. In Stanley Maloy and Kelly Hughes, editors, Brenner's Encyclopedia of Genetics (Second Edition), pages 425–430. Academic Press, San Diego, January 2013. URL: https://www.sciencedirect.com/science/article/pii/B9780123749840012031.

29

Robin C Friedman, Kyle Kai-How Farh, Christopher B Burge, and David P Bartel. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res., 19(1):92–105, January 2009. URL: http://dx.doi.org/10.1101/gr.082701.108.

30

Valer Gotea and Wojciech Makałowski. Do transposable elements really contribute to proteomes? Trends Genet., 22(5):260–267, May 2006. URL: http://dx.doi.org/10.1016/j.tig.2006.03.006.

31

Seth W Cheetham, Geoffrey J Faulkner, and Marcel E Dinger. Overcoming challenges and dogmas to understand the functions of pseudogenes. Nat. Rev. Genet., 21(3):191–201, March 2020. URL: http://dx.doi.org/10.1038/s41576-019-0196-1.

32

Chava Kimchi-Sarfaty, Jung Mi Oh, In-Wha Kim, Zuben E Sauna, Anna Maria Calcagno, Suresh V Ambudkar, and Michael M Gottesman. A" silent" polymorphism in the mdr 1 gene changes substrate specificity. Science, 315(5811):525–528, 2007.

33

Nicolas Jarraud. Mecabricks. https://mecabricks.com/, 2019.

34

A G Murzin, S E Brenner, T Hubbard, and C Chothia. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247(4):536–540, April 1995. URL: http://dx.doi.org/10.1006/jmbi.1995.0159.

35

Eiko I Fried. What are psychological constructs? on the nature and statistical modelling of emotions, intelligence, personality traits and mental disorders. Health Psychol. Rev., 11(2):130–134, June 2017. URL: http://dx.doi.org/10.1080/17437199.2017.1306718.

36

Ian Hacking, Emeritus University Professor Ian Hacking, and Jan Hacking. The Social Construction of What? Harvard University Press, 1999. URL: https://play.google.com/store/books/details?id=XkCR1p2YMRwC.

37

Thomas Szasz. The myth of mental illness. In James M Humber and Robert F Almeder, editors, Biomedical Ethics and the Law, pages 113–122. Springer US, Boston, MA, 1976. URL: https://doi.org/10.1007/978-1-4684-2223-8_10.

38

Jocelyn Kaiser. Genetics may explain up to 25% of same-sex behavior, giant analysis reveals. Science Magazine, August 2019. URL: https://www.sciencemag.org/news/2019/08/genetics-may-explain-25-same-sex-behavior-giant-analysis-reveals.

39

Ken Richardson. What IQ tests test. Theory Psychol., 12(3):283–314, June 2002. URL: https://doi.org/10.1177/0959354302012003012.

40

Angela Saini. The disturbing return of scientific racism. https://www.wired.co.uk/article/superior-the-return-of-race-science-angela-saini. Accessed: 2021-2-11. URL: https://www.wired.co.uk/article/superior-the-return-of-race-science-angela-saini.

41

U Schüklenk, E Stein, J Kerin, and W Byne. The ethics of genetic research on sexual orientation. Hastings Cent. Rep., 27(4):6–13, July 1997. URL: https://www.ncbi.nlm.nih.gov/pubmed/9271716.

42

Xiaolin Wu and Xi Zhang. Responses to critiques on machine learning of criminality perceptions (addendum of arxiv:1611.04135). arXiv, November 2016. URL: http://arxiv.org/abs/1611.04135, arXiv:1611.04135.

43

Luke Stark. Facial recognition, emotion and race in animated social media. First Monday, September 2018. URL: http://journals.uic.edu/ojs/index.php/fm/article/view/9406.

44

Andrew Pulrang. Disabled people explained: why we say we don't want to be cured — disability thinking. https://disabilitythinking.com/disabilitythinking/2019/4/22/disabled-people-explained-why-we-say-we-dont-want-to-be-cured, April 2019. Accessed: 2021-2-11. URL: https://disabilitythinking.com/disabilitythinking/2019/4/22/disabled-people-explained-why-we-say-we-dont-want-to-be-cured.

45

Paul Steven Miller and Rebecca Leah Levine. Avoiding genetic genocide: understanding good intentions and eugenics in the complex dialogue between the medical and disability communities. Genet. Med., 15(2):95–102, February 2013. URL: http://dx.doi.org/10.1038/gim.2012.102.

46

Michael Le Page. We don't know what a fifth of our genes do – and won't find out soon. New Scientist, February 2019. URL: https://www.newscientist.com/article/2194516-we-dont-know-what-a-fifth-of-our-genes-do-and-wont-find-out-soon/.

47

Elizabeth Pennisi. Genomics. DNA study forces rethink of what it means to be a gene. Science, 316(5831):1556–1557, June 2007. URL: http://dx.doi.org/10.1126/science.316.5831.1556.

48

Franziska Pfeiffer, Carsten Gröber, Michael Blank, Kristian Händler, Marc Beyer, Joachim L Schultze, and Günter Mayer. Systematic evaluation of error rates and causes in short samples in next-generation sequencing. Sci. Rep., 8(1):10950, July 2018.

49

A P Jason de Koning, Wanjun Gu, Todd A Castoe, Mark A Batzer, and David D Pollock. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet., 7(12):e1002384, December 2011.

50

Brian J Haas and Michael C Zody. Advancing RNA-Seq analysis. Nat. Biotechnol., 28(5):421–423, May 2010.

51

Roger Bumgarner. Overview of DNA microarrays: types, applications, and their future. Curr. Protoc. Mol. Biol., Chapter 22:Unit 22.1., January 2013.

52

Karen H Miga. Completing the human genome: the progress and challenge of satellite DNA assembly. Chromosome Res., 23(3):421–426, September 2015. URL: http://dx.doi.org/10.1007/s10577-015-9488-2.

53

W James Kent, Charles W Sugnet, Terrence S Furey, Krishna M Roskin, Tom H Pringle, Alan M Zahler, and David Haussler. The human genome browser at UCSC. Genome Res., 12(6):996–1006, June 2002. URL: http://dx.doi.org/10.1101/gr.229102.

54

C Harger, G Chen, A Farmer, W Huang, J Inman, D Kiphart, F Schilkey, M P Skupski, and J Weller. The genome sequence DataBase. Nucleic Acids Res., 28(1):31–32, January 2000. URL: http://dx.doi.org/10.1093/nar/28.1.31.

55

T Hubbard, D Barker, E Birney, G Cameron, Y Chen, L Clark, T Cox, J Cuff, V Curwen, T Down, R Durbin, E Eyras, J Gilbert, M Hammond, L Huminiecki, A Kasprzyk, H Lehvaslaiho, P Lijnzaad, C Melsopp, E Mongin, R Pettett, M Pocock, S Potter, A Rust, E Schmidt, S Searle, G Slater, J Smith, W Spooner, A Stabenau, J Stalker, E Stupka, A Ureta-Vidal, I Vastrik, and M Clamp. The ensembl genome database project. Nucleic Acids Res., 30(1):38–41, January 2002. URL: http://dx.doi.org/10.1093/nar/30.1.38.

56

Genome browser FAQ. https://genome.ucsc.edu/FAQ/FAQreleases.html. Accessed: 2020-12-13. URL: https://genome.ucsc.edu/FAQ/FAQreleases.html.

57

GATK Team. Human genome reference builds - GRCh38 or hg38 - b37 - hg19. https://gatk.broadinstitute.org/hc/en-us/articles/360035890951, June 2020. Accessed: 2020-12-13. URL: https://gatk.broadinstitute.org/hc/en-us/articles/360035890951.

58

Steven L Salzberg. Open questions: how many genes do we have? BMC Biol., 16(1):94, August 2018.

59

Alice Meadows, Laurel L Haak, and Josh Brown. Persistent identifiers: the building blocks of the research information infrastructure. Insights Imaging, 32(1):9, March 2019. URL: http://insights.uksg.org/articles/10.1629/uksg.457/.

60

James Vincent. Scientists rename human genes to stop microsoft excel from misreading them as dates. https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates, August 2020. Accessed: 2021-2-7. URL: https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates.

61

S T Sherry. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29(1):308–311, 2001. URL: http://dx.doi.org/10.1093/nar/29.1.308.

62

Ryan J Andrews, Levi Baber, and Walter N Moss. RNAStructuromeDB: a genome-wide database for RNA structural inference. Sci. Rep., 7(1):17269, December 2017. URL: http://dx.doi.org/10.1038/s41598-017-17510-y.

63

Katherine E Richardson, Charles C Kirkpatrick, and Brent M Znosko. RNA CoSSMos 2.0: an improved searchable database of secondary structure motifs in RNA three-dimensional structures. Database, January 2020. URL: http://dx.doi.org/10.1093/database/baz153.

64

Robert Petryszak, Maria Keays, Y Amy Tang, Nuno A Fonseca, Elisabet Barrera, Tony Burdett, Anja Füllgrabe, Alfonso Muñoz-Pomer Fuentes, Simon Jupp, Satu Koskinen, Oliver Mannion, Laura Huerta, Karine Megy, Catherine Snow, Eleanor Williams, Mitra Barzine, Emma Hastings, Hendrik Weisser, James Wright, Pankaj Jaiswal, Wolfgang Huber, Jyoti Choudhary, Helen E Parkinson, and Alvis Brazma. Expression atlas update—an integrated database of gene and protein expression in humans, animals and plants. 2016. URL: http://dx.doi.org/10.1093/nar/gkv1045.

65

Irene Papatheodorou, Pablo Moreno, Jonathan Manning, Alfonso Muñoz-Pomer Fuentes, Nancy George, Silvie Fexova, Nuno A Fonseca, Anja Füllgrabe, Matthew Green, Ni Huang, Laura Huerta, Haider Iqbal, Monica Jianu, Suhaib Mohammed, Lingyun Zhao, Andrew F Jarnuczak, Simon Jupp, John Marioni, Kerstin Meyer, Robert Petryszak, Cesar Augusto Prada Medina, Carlos Talavera-López, Sarah Teichmann, Juan Antonio Vizcaino, and Alvis Brazma. Expression atlas update: from tissues to single cells. Nucleic Acids Res., 48(D1):D77–D83, January 2020. URL: http://dx.doi.org/10.1093/nar/gkz947.

66

Ashraful Haque, Jessica Engel, Sarah A. Teichmann & Tapio Lönnberg. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Medicine, August 2017. URL: https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-017-0467-4.

67

I Illumina. Understanding illumina quality scores. Technical Note: Informatics, 2014. URL: https://www.illumina.com/documents/products/technotes/technote_Q-Scores.pdf.

68

Günter P Wagner, Koryu Kin, and Vincent J Lynch. Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci., 131(4):281–285, December 2012. URL: http://dx.doi.org/10.1007/s12064-012-0162-3.

69

H Pimentel. What the FPKM? a review of RNA-Seq expression units. 2014. URL: https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/.

70

M D Robinson, D J McCarthy, and G K Smyth. Edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1):139–140, 2010. URL: http://dx.doi.org/10.1093/bioinformatics/btp616.

71

Michael I Love, Wolfgang Huber, and Simon Anders. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol., 15(12):550, 2014. URL: http://dx.doi.org/10.1186/s13059-014-0550-8.

72

Robert T Hersh. Atlas of protein sequence and structure, 1966. Syst. Biol., 16(3):262–263, September 1967.

73

Sangya Pundir, Maria J Martin, Claire O'Donovan, and The UniProt Consortium. UniProt tools. Curr. Protoc. Bioinformatics, pages 1.29.1–1.29.15, 2016. URL: http://dx.doi.org/10.1002/0471250953.bi0129s53.

74

M Wang, M Weiss, M Simonovic, G Haertinger, S P Schrimpf, M O Hengartner, and C von Mering. PaxDb, a database of protein abundance averages across all three domains of life. Mol. Cell. Proteomics, 11(8):492–500, August 2012. URL: http://dx.doi.org/10.1074/mcp.O111.014704.

75

Patroklos Samaras, Tobias Schmidt, Martin Frejno, Siegfried Gessulat, Maria Reinecke, Anna Jarzab, Jana Zecha, Julia Mergner, Piero Giansanti, Hans-Christian Ehrlich, Stephan Aiche, Johannes Rank, Harald Kienegger, Helmut Krcmar, Bernhard Kuster, and Mathias Wilhelm. ProteomicsDB: a multi-omics and multi-organism resource for life science research. Nucleic Acids Res., 48(D1):D1153–D1163, January 2020. URL: http://dx.doi.org/10.1093/nar/gkz974.

76

Björn Schwanhäusser, Dorothea Busse, Na Li, Gunnar Dittmar, Johannes Schuchhardt, Jana Wolf, Wei Chen, and Matthias Selbach. Global quantification of mammalian gene expression control. Nature, 473(7347):337–342, May 2011. URL: http://dx.doi.org/10.1038/nature10098.

77

S P Gygi, Y Rochon, B R Franza, and R Aebersold. Correlation between protein and mRNA abundance in yeast. Mol. Cell. Biol., 19(3):1720–1730, March 1999. URL: https://www.ncbi.nlm.nih.gov/pubmed/10022859.

78

Idit Kosti, Nishant Jain, Dvir Aran, Atul J Butte, and Marina Sirota. Cross-tissue analysis of gene and protein expression in normal and cancer tissues. Sci. Rep., 6:24799, May 2016. URL: http://dx.doi.org/10.1038/srep24799.

79

Protein data bank.

80

Adrian Furnham. Response bias, social desirability and dissimulation. Pers. Individ. Dif., 7(3):385–400, January 1986. URL: https://www.sciencedirect.com/science/article/pii/0191886986900140.

81

B Knäuper and H U Wittchen. Diagnosing major depression in the elderly: evidence for response bias in standardized diagnostic interviews? J. Psychiatr. Res., 28(2):147–164, March 1994. URL: http://dx.doi.org/10.1016/0022-3956(94)90026-4.

82

Golding, Golding, Pembrey, Jones, and The Alspac Study Team. ALSPAC-The avon longitudinal study of parents and children. Paediatr. Perinat. Epidemiol., 15(1):74–87, 2001. URL: http://dx.doi.org/10.1046/j.1365-3016.2001.00325.x.

83

Clare Bycroft, Colin Freeman, Desislava Petkova, Gavin Band, Lloyd T Elliott, Kevin Sharp, Allan Motyer, Damjan Vukcevic, Olivier Delaneau, Jared O'Connell, Adrian Cortes, Samantha Welsh, Alan Young, Mark Effingham, Gil McVean, Stephen Leslie, Naomi Allen, Peter Donnelly, and Jonathan Marchini. The UK biobank resource with deep phenotyping and genomic data. Nature, 562(7726):203–209, October 2018. URL: http://dx.doi.org/10.1038/s41586-018-0579-z.

84

Kendall Powell. The broken promise that undermines human genome research. Nature, 590(7845):198–201, February 2021. URL: https://www.nature.com/articles/d41586-021-00331-5.

85

Joanna S Amberger, Carol A Bocchini, Alan F Scott, and Ada Hamosh. Omim. org: leveraging knowledge across phenotype–gene relationships. Nucleic acids research, 47(D1):D1038–D1043, 2019.

86

Peter D Stenson, Matthew Mort, Edward V Ball, Katy Evans, Matthew Hayden, Sally Heywood, Michelle Hussain, Andrew D Phillips, and David N Cooper. The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Human genetics, 136(6):665–677, 2017.

87

Annalisa Buniello, Jacqueline A L MacArthur, Maria Cerezo, Laura W Harris, James Hayhurst, Cinzia Malangone, Aoife McMahon, Joannella Morales, Edward Mountjoy, Elliot Sollis, Daniel Suveges, Olga Vrousgou, Patricia L Whetzel, Ridwan Amode, Jose A Guillen, Harpreet S Riat, Stephen J Trevanion, Peggy Hall, Heather Junkins, Paul Flicek, Tony Burdett, Lucia A Hindorff, Fiona Cunningham, and Helen Parkinson. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res., 47(D1):D1005–D1012, January 2019. URL: http://dx.doi.org/10.1093/nar/gky1120.

88

J Macarthur L. Emery. GWAS catalog: exploring SNP-trait associations. http://europepmc.org/article/CTX/C7914, December 2017. Accessed: 2020-9-3. URL: http://europepmc.org/article/CTX/C7914.

89

Joshua C Denny, Marylyn D Ritchie, Melissa A Basford, Jill M Pulley, Lisa Bastarache, Kristin Brown-Gentry, Deede Wang, Dan R Masys, Dan M Roden, and Dana C Crawford. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations. Bioinformatics, 26(9):1205–1210, March 2010. URL: https://academic.oup.com/bioinformatics/article-abstract/26/9/1205/201211.

90

Ivana Barbaric, Gaynor Miller, and T Neil Dear. Appearances can be deceiving: phenotypes of knockout mice. Brief. Funct. Genomic. Proteomic., 6(2):91–103, June 2007.

91

Aihua Zhang, Hui Sun, Guangli Yan, Ping Wang, and Xijun Wang. Mass spectrometry-based metabolomics: applications to biomarker and metabolic pathway research. Biomed. Chromatogr., 30(1):7–12, January 2016. URL: http://dx.doi.org/10.1002/bmc.3453.

92

Ganesh A Viswanathan, Jeremy Seto, Sonali Patil, German Nudelman, and Stuart C Sealfon. Getting started in biological pathway construction and analysis. PLoS Comput. Biol., 4(2):e16, February 2008. URL: http://dx.doi.org/10.1371/journal.pcbi.0040016.

93

Antonio Fabregat, Steven Jupe, Lisa Matthews, Konstantinos Sidiropoulos, Marc Gillespie, Phani Garapati, Robin Haw, Bijay Jassal, Florian Korninger, Bruce May, Marija Milacic, Corina Duenas Roca, Karen Rothfels, Cristoffer Sevilla, Veronica Shamovsky, Solomon Shorser, Thawfeek Varusai, Guilherme Viteri, Joel Weiser, Guanming Wu, Lincoln Stein, Henning Hermjakob, and Peter D'Eustachio. The reactome pathway knowledgebase. Nucleic Acids Res., 46(D1):D649–D655, January 2018. URL: http://dx.doi.org/10.1093/nar/gkx1132.

94

Minoru Kanehisa, Michihiro Araki, Susumu Goto, Masahiro Hattori, Mika Hirakawa, Masumi Itoh, Toshiaki Katayama, Shuichi Kawashima, Shujiro Okuda, Toshiaki Tokimatsu, and Yoshihiro Yamanishi. KEGG for linking genomes to life and the environment. Nucleic Acids Res., 36(Database issue):D480–4, January 2008. URL: http://dx.doi.org/10.1093/nar/gkm882.

95

Wilfrid Blunt. The Compleat Naturalist: A Life of Linnaeus. Frances Lincoln, 2001. URL: https://play.google.com/store/books/details?id=B3YOvgAACAAJ.

96

Ehret. Plantae et papiliones rariores. Volume 1748. [London :s.n.],, 1748. URL: https://www.biodiversitylibrary.org/item/205762.

97

Dr Isabelle Charmantier. Linnaeus and race. https://www.linnean.org/learning/who-was-linnaeus/linnaeus-and-race. Accessed: 2020-11-30. URL: https://www.linnean.org/learning/who-was-linnaeus/linnaeus-and-race.

98

Lars J Jensen and Peer Bork. Ontologies in quantitative biology: a basis for comparison, integration, and discovery. PLoS Biol., 8(5):e1000374, May 2010. URL: http://dx.doi.org/10.1371/journal.pbio.1000374.

99

James A Overton, Heiko Dietze, Shahim Essaid, David Osumi-Sutherland, and Christopher J Mungall. ROBOT: a command-line tool for ontology development. In ICBO. ceur-ws.org, 2015. URL: http://ceur-ws.org/Vol-1515/demo6.pdf.

100

Eran Eden, Roy Navon, Israel Steinfeld, Doron Lipson, and Zohar Yakhini. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics, 10:48, February 2009. URL: http://dx.doi.org/10.1186/1471-2105-10-48.

101

M Ashburner, C A Ball, J A Blake, D Botstein, H Butler, J M Cherry, A P Davis, K Dolinski, S S Dwight, J T Eppig, M A Harris, D P Hill, L Issel-Tarver, A Kasarskis, S Lewis, J C Matese, J E Richardson, M Ringwald, G M Rubin, and G Sherlock. Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat. Genet., 25(1):25–29, May 2000. URL: http://dx.doi.org/10.1038/75556.

102

Paul D Thomas. The gene ontology and the meaning of biological function. Methods Mol. Biol., 1446:15–24, 2017. URL: http://dx.doi.org/10.1007/978-1-4939-3743-1_2.

103

Evelyn Camon, Michele Magrane, Daniel Barrell, Vivian Lee, Emily Dimmer, John Maslen, David Binns, Nicola Harte, Rodrigo Lopez, and Rolf Apweiler. The gene ontology annotation (GOA) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Res., 32(Database issue):D262–6, January 2004. URL: http://dx.doi.org/10.1093/nar/gkh021.

104

Christopher J Mungall, Carlo Torniai, Georgios V Gkoutos, Suzanna E Lewis, and Melissa A Haendel. Uberon, an integrative multi-species anatomy ontology. Genome Biol., 13(1):R5, January 2012. URL: http://dx.doi.org/10.1186/gb-2012-13-1-r5.

105

Venkat S Malladi, Drew T Erickson, Nikhil R Podduturi, Laurence D Rowe, Esther T Chan, Jean M Davidson, Benjamin C Hitz, Marcus Ho, Brian T Lee, Stuart Miyasato, Gregory R Roe, Matt Simison, Cricket A Sloan, J Seth Strattan, Forrest Tanaka, W James Kent, J Michael Cherry, and Eurie L Hong. Ontology application and use at the ENCODE DCC. Database, March 2015. URL: http://dx.doi.org/10.1093/database/bav010.

106

Lynn M Schriml, Elvira Mitraka, James Munro, Becky Tauber, Mike Schor, Lance Nickle, Victor Felix, Linda Jeng, Cynthia Bearer, Richard Lichenstein, Katharine Bisordi, Nicole Campion, Brooke Hyman, David Kurland, Connor Patrick Oates, Siobhan Kibbey, Poorna Sreekumar, Chris Le, Michelle Giglio, and Carol Greene. Human disease ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res., 47(D1):D955–D962, January 2019. URL: http://dx.doi.org/10.1093/nar/gky1032.

107

Sebastian Köhler, Michael Gargano, Nicolas Matentzoglu, Leigh C Carmody, David Lewis-Smith, Nicole A Vasilevsky, Daniel Danis, Ganna Balagura, Gareth Baynam, Amy M Brower, Tiffany J Callahan, Christopher G Chute, Johanna L Est, Peter D Galer, Shiva Ganesan, Matthias Griese, Matthias Haimel, Julia Pazmandi, Marc Hanauer, Nomi L Harris, Michael J Hartnett, Maximilian Hastreiter, Fabian Hauck, Yongqun He, Tim Jeske, Hugh Kearney, Gerhard Kindle, Christoph Klein, Katrin Knoflach, Roland Krause, David Lagorce, Julie A McMurry, Jillian A Miller, Monica C Munoz-Torres, Rebecca L Peters, Christina K Rapp, Ana M Rath, Shahmir A Rind, Avi Z Rosenberg, Michael M Segal, Markus G Seidel, Damian Smedley, Tomer Talmy, Yarlalu Thomas, Samuel A Wiafe, Julie Xian, Zafer Yüksel, Ingo Helbig, Christopher J Mungall, Melissa A Haendel, and Peter N Robinson. The human phenotype ontology in 2021. Nucleic Acids Res., 49(D1):D1207–D1217, January 2021. URL: http://dx.doi.org/10.1093/nar/gkaa1043.

108

James Malone, Ele Holloway, Tomasz Adamusiak, Misha Kapushesky, Jie Zheng, Nikolay Kolesnikov, Anna Zhukova, Alvis Brazma, and Helen Parkinson. Modeling sample variables with an experimental factor ontology. Bioinformatics, 26(8):1112–1118, April 2010. URL: http://dx.doi.org/10.1093/bioinformatics/btq099.

109

S F Altschul, W Gish, W Miller, E W Myers, and D J Lipman. Basic local alignment search tool. J. Mol. Biol., 215(3):403–410, October 1990. URL: http://dx.doi.org/10.1016/S0022-2836(05)80360-2.

110

Antonina Andreeva, Dave Howorth, Cyrus Chothia, Eugene Kulesha, and Alexey G Murzin. SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res., 42(Database issue):D310–4, January 2014. URL: http://dx.doi.org/10.1093/nar/gkt1242.

111

C A Orengo, A D Michie, S Jones, D T Jones, M B Swindells, and J M Thornton. CATH – a hierarchic classification of protein domain structures. Structure, 5(8):1093–1109, 1997. URL: http://dx.doi.org/10.1016/s0969-2126(97)00260-8.

112

Gergely Csaba, Fabian Birzele, and Ralf Zimmer. Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis. BMC Struct. Biol., 9:23, April 2009. URL: http://dx.doi.org/10.1186/1472-6807-9-23.

113

J Gough, K Karplus, R Hughey, and C Chothia. Assignment of homology to genome sequences using a library of hidden markov models that represent all proteins of known structure. J. Mol. Biol., 313(4):903–919, November 2001. URL: http://dx.doi.org/10.1006/jmbi.2001.5080.

114

Hai Fang and Julian Gough. DcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more. Nucleic Acids Res., 41(Database issue):D536–44, January 2013. URL: http://dx.doi.org/10.1093/nar/gks1080.

115

Hai Fang and Julian Gough. A domain-centric solution to functional genomics via dcGO predictor. BMC Bioinformatics, 14 Suppl 3:S9, February 2013. URL: http://dx.doi.org/10.1186/1471-2105-14-S3-S9.

116

Yuxiang Jiang, Tal Ronnen Oron, Wyatt T Clark, Asma R Bankapur, Daniel D'Andrea, Rosalba Lepore, Christopher S Funk, Indika Kahanda, Karin M Verspoor, Asa Ben-Hur, Da Chen Emily Koo, Duncan Penfold-Brown, Dennis Shasha, Noah Youngs, Richard Bonneau, Alexandra Lin, Sayed M E Sahraeian, Pier Luigi Martelli, Giuseppe Profiti, Rita Casadio, Renzhi Cao, Zhaolong Zhong, Jianlin Cheng, Adrian Altenhoff, Nives Skunca, Christophe Dessimoz, Tunca Dogan, Kai Hakala, Suwisa Kaewphan, Farrokh Mehryary, Tapio Salakoski, Filip Ginter, Hai Fang, Ben Smithers, Matt Oates, Julian Gough, Petri Törönen, Patrik Koskinen, Liisa Holm, Ching-Tai Chen, Wen-Lian Hsu, Kevin Bryson, Domenico Cozzetto, Federico Minneci, David T Jones, Samuel Chapman, Dukka Bkc, Ishita K Khan, Daisuke Kihara, Dan Ofer, Nadav Rappoport, Amos Stern, Elena Cibrian-Uhalte, Paul Denny, Rebecca E Foulger, Reija Hieta, Duncan Legge, Ruth C Lovering, Michele Magrane, Anna N Melidoni, Prudence Mutowo-Meullenet, Klemens Pichler, Aleksandra Shypitsyna, Biao Li, Pooya Zakeri, Sarah ElShal, Léon-Charles Tranchevent, Sayoni Das, Natalie L Dawson, David Lee, Jonathan G Lees, Ian Sillitoe, Prajwal Bhat, Tamás Nepusz, Alfonso E Romero, Rajkumar Sasidharan, Haixuan Yang, Alberto Paccanaro, Jesse Gillis, Adriana E Sedeño-Cortés, Paul Pavlidis, Shou Feng, Juan M Cejuela, Tatyana Goldberg, Tobias Hamp, Lothar Richter, Asaf Salamov, Toni Gabaldon, Marina Marcet-Houben, Fran Supek, Qingtian Gong, Wei Ning, Yuanpeng Zhou, Weidong Tian, Marco Falda, Paolo Fontana, Enrico Lavezzo, Stefano Toppo, Carlo Ferrari, Manuel Giollo, Damiano Piovesan, Silvio C E Tosatto, Angela Del Pozo, José M Fernández, Paolo Maietta, Alfonso Valencia, Michael L Tress, Alfredo Benso, Stefano Di Carlo, Gianfranco Politano, Alessandro Savino, Hafeez Ur Rehman, Matteo Re, Marco Mesiti, Giorgio Valentini, Joachim W Bargsten, Aalt D J van Dijk, Branislava Gemovic, Sanja Glisic, Vladmir Perovic, Veljko Veljkovic, Nevena Veljkovic, Danillo C Almeida-E-Silva, Ricardo Z N Vencio, Malvika Sharan, Jörg Vogel, Lakesh Kansakar, Shanshan Zhang, Slobodan Vucetic, Zheng Wang, Michael J E Sternberg, Mark N Wass, Rachael P Huntley, Maria J Martin, Claire O'Donovan, Peter N Robinson, Yves Moreau, Anna Tramontano, Patricia C Babbitt, Steven E Brenner, Michal Linial, Christine A Orengo, Burkhard Rost, Casey S Greene, Sean D Mooney, Iddo Friedberg, and Predrag Radivojac. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol., 17(1):184, September 2016. URL: http://dx.doi.org/10.1186/s13059-016-1037-6.

117

Predrag Radivojac, Wyatt T Clark, Tal Ronnen Oron, Alexandra M Schnoes, Tobias Wittkop, Artem Sokolov, Kiley Graim, Christopher Funk, Karin Verspoor, Asa Ben-Hur, Gaurav Pandey, Jeffrey M Yunes, Ameet S Talwalkar, Susanna Repo, Michael L Souza, Damiano Piovesan, Rita Casadio, Zheng Wang, Jianlin Cheng, Hai Fang, Julian Gough, Patrik Koskinen, Petri Törönen, Jussi Nokso-Koivisto, Liisa Holm, Domenico Cozzetto, Daniel W A Buchan, Kevin Bryson, David T Jones, Bhakti Limaye, Harshal Inamdar, Avik Datta, Sunitha K Manjari, Rajendra Joshi, Meghana Chitale, Daisuke Kihara, Andreas M Lisewski, Serkan Erdin, Eric Venner, Olivier Lichtarge, Robert Rentzsch, Haixuan Yang, Alfonso E Romero, Prajwal Bhat, Alberto Paccanaro, Tobias Hamp, Rebecca Kaßner, Stefan Seemayer, Esmeralda Vicedo, Christian Schaefer, Dominik Achten, Florian Auer, Ariane Boehm, Tatjana Braun, Maximilian Hecht, Mark Heron, Peter Hönigschmid, Thomas A Hopf, Stefanie Kaufmann, Michael Kiening, Denis Krompass, Cedric Landerer, Yannick Mahlich, Manfred Roos, Jari Björne, Tapio Salakoski, Andrew Wong, Hagit Shatkay, Fanny Gatzmann, Ingolf Sommer, Mark N Wass, Michael J E Sternberg, Nives Škunca, Fran Supek, Matko Bošnjak, Panče Panov, Sašo Džeroski, Tomislav Šmuc, Yiannis A I Kourmpetis, Aalt D J van Dijk, Cajo J F ter Braak, Yuanpeng Zhou, Qingtian Gong, Xinran Dong, Weidong Tian, Marco Falda, Paolo Fontana, Enrico Lavezzo, Barbara Di Camillo, Stefano Toppo, Liang Lan, Nemanja Djuric, Yuhong Guo, Slobodan Vucetic, Amos Bairoch, Michal Linial, Patricia C Babbitt, Steven E Brenner, Christine Orengo, Burkhard Rost, Sean D Mooney, and Iddo Friedberg. A large-scale evaluation of computational protein function prediction. Nat. Methods, 10(3):221–227, March 2013. URL: http://dx.doi.org/10.1038/nmeth.2340.

118

Hashem A Shihab, Julian Gough, David N Cooper, Peter D Stenson, Gary L A Barker, Keith J Edwards, Ian N M Day, and Tom R Gaunt. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden markov models. Hum. Mutat., 34(1):57–65, January 2013. URL: http://dx.doi.org/10.1002/humu.22225.

119

Peter D Stenson, Matthew Mort, Edward V Ball, Katy Howells, Andrew D Phillips, Nick St Thomas, and David N Cooper. The human gene mutation database: 2008 update. Genome Med., 1(1):13, January 2009. URL: http://dx.doi.org/10.1186/gm13.

120

William McLaren, Laurent Gil, Sarah E Hunt, Harpreet Singh Riat, Graham R S Ritchie, Anja Thormann, Paul Flicek, and Fiona Cunningham. The ensembl variant effect predictor. Genome Biol., 17(1):122, June 2016. URL: http://dx.doi.org/10.1186/s13059-016-0974-4.

121

Gregory McInnes, Roxana Daneshjou, Panagiostis Katsonis, Olivier Lichtarge, Rajgopal Srinivasan, Sadhna Rana, Predrag Radivojac, Sean D Mooney, Kymberleigh A Pagel, Moses Stamboulian, Yuxiang Jiang, Emidio Capriotti, Yanran Wang, Yana Bromberg, Samuele Bovo, Castrense Savojardo, Pier Luigi Martelli, Rita Casadio, Lipika R Pal, John Moult, Steven E Brenner, and Russ Altman. Predicting venous thromboembolism risk from exomes in the critical assessment of genome interpretation (CAGI) challenges. Hum. Mutat., 40(9):1314–1320, September 2019. URL: http://dx.doi.org/10.1002/humu.23825.

122

Laura Kasak, Jesse M Hunter, Rupa Udani, Constantina Bakolitsa, Zhiqiang Hu, Aashish N Adhikari, Giulia Babbi, Rita Casadio, Julian Gough, Rafael F Guerrero, Yuxiang Jiang, Thomas Joseph, Panagiotis Katsonis, Sujatha Kotte, Kunal Kundu, Olivier Lichtarge, Pier Luigi Martelli, Sean D Mooney, John Moult, Lipika R Pal, Jennifer Poitras, Predrag Radivojac, Aditya Rao, Naveen Sivadasan, Uma Sunderam, V G Saipradeep, Yizhou Yin, Jan Zaucha, Steven E Brenner, and M Stephen Meyn. CAGI SickKids challenges: assessment of phenotype and variant predictions derived from clinical and genomic data of children with undiagnosed diseases. Hum. Mutat., 40(9):1373–1391, September 2019. URL: http://dx.doi.org/10.1002/humu.23874.

123

The Turing Way Community, Becky Arnold, Louise Bowler, Sarah Gibson, Patricia Herterich, Rosie Higman, Anna Krystalli, Alexander Morley, Martin O'Reilly, and Kirstie Whitaker. The turing way: a handbook for reproducible data science. March 2019. URL: https://zenodo.org/record/3233986.

124

C Glenn Begley and Lee M Ellis. Drug development: raise standards for preclinical cancer research. Nature, 483(7391):531–533, March 2012. URL: http://dx.doi.org/10.1038/483531a.

125

Florian Prinz, Thomas Schlange, and Khusru Asadullah. Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov., 10(9):712–712, 2011. URL: http://dx.doi.org/10.1038/nrd3439-c1.

126

Open Science Collaboration. PSYCHOLOGY. estimating the reproducibility of psychological science. Science, 349(6251):aac4716, August 2015. URL: http://dx.doi.org/10.1126/science.aac4716.

127

Monya Baker. 1,500 scientists lift the lid on reproducibility. Nature, 533(7604):452–454, May 2016. URL: http://dx.doi.org/10.1038/533452a.

128

Dorothy Bishop. Rein in the four horsemen of irreproducibility. Nature, 568(7753):435, April 2019. URL: http://dx.doi.org/10.1038/d41586-019-01307-2.

129

Daniele Fanelli. How many scientists fabricate and falsify research? a systematic review and meta-analysis of survey data. PLoS One, 4(5):e5738, May 2009. URL: http://dx.doi.org/10.1371/journal.pone.0005738.

130

John P A Ioannidis. Why most published research findings are false. PLoS Med., 2(8):e124, August 2005. URL: http://dx.doi.org/10.1371/journal.pmed.0020124.

131

Daniel Lakens, Federico G Adolfi, Casper J Albers, Farid Anvari, Matthew A J Apps, Shlomo E Argamon, Thom Baguley, Raymond B Becker, Stephen D Benning, Daniel E Bradford, Erin M Buchanan, Aaron R Caldwell, Ben Van Calster, Rickard Carlsson, Sau-Chin Chen, Bryan Chung, Lincoln J Colling, Gary S Collins, Zander Crook, Emily S Cross, Sameera Daniels, Henrik Danielsson, Lisa DeBruine, Daniel J Dunleavy, Brian D Earp, Michele I Feist, Jason D Ferrell, James G Field, Nicholas W Fox, Amanda Friesen, Caio Gomes, Monica Gonzalez-Marquez, James A Grange, Andrew P Grieve, Robert Guggenberger, James Grist, Anne-Laura van Harmelen, Fred Hasselman, Kevin D Hochard, Mark R Hoffarth, Nicholas P Holmes, Michael Ingre, Peder M Isager, Hanna K Isotalus, Christer Johansson, Konrad Juszczyk, David A Kenny, Ahmed A Khalil, Barbara Konat, Junpeng Lao, Erik Gahner Larsen, Gerine M A Lodder, Jiří Lukavský, Christopher R Madan, David Manheim, Stephen R Martin, Andrea E Martin, Deborah G Mayo, Randy J McCarthy, Kevin McConway, Colin McFarland, Amanda Q X Nio, Gustav Nilsonne, Cilene Lino de Oliveira, Jean-Jacques Orban de Xivry, Sam Parsons, Gerit Pfuhl, Kimberly A Quinn, John J Sakon, S Adil Saribay, Iris K Schneider, Manojkumar Selvaraju, Zsuzsika Sjoerds, Samuel G Smith, Tim Smits, Jeffrey R Spies, Vishnu Sreekumar, Crystal N Steltenpohl, Neil Stenhouse, Wojciech Świątkowski, Miguel A Vadillo, Marcel A L M Van Assen, Matt N Williams, Samantha E Williams, Donald R Williams, Tal Yarkoni, Ignazio Ziano, and Rolf A Zwaan. Justify your alpha. Nature Human Behaviour, 2(3):168–171, February 2018. URL: https://www.nature.com/articles/s41562-018-0311-x.

132

Daniel J Benjamin, James O Berger, Magnus Johannesson, Brian A Nosek, E-J Wagenmakers, Richard Berk, Kenneth A Bollen, Björn Brembs, Lawrence Brown, Colin Camerer, David Cesarini, Christopher D Chambers, Merlise Clyde, Thomas D Cook, Paul De Boeck, Zoltan Dienes, Anna Dreber, Kenny Easwaran, Charles Efferson, Ernst Fehr, Fiona Fidler, Andy P Field, Malcolm Forster, Edward I George, Richard Gonzalez, Steven Goodman, Edwin Green, Donald P Green, Anthony G Greenwald, Jarrod D Hadfield, Larry V Hedges, Leonhard Held, Teck Hua Ho, Herbert Hoijtink, Daniel J Hruschka, Kosuke Imai, Guido Imbens, John P A Ioannidis, Minjeong Jeon, James Holland Jones, Michael Kirchler, David Laibson, John List, Roderick Little, Arthur Lupia, Edouard Machery, Scott E Maxwell, Michael McCarthy, Don A Moore, Stephen L Morgan, Marcus Munafó, Shinichi Nakagawa, Brendan Nyhan, Timothy H Parker, Luis Pericchi, Marco Perugini, Jeff Rouder, Judith Rousseau, Victoria Savalei, Felix D Schönbrodt, Thomas Sellke, Betsy Sinclair, Dustin Tingley, Trisha Van Zandt, Simine Vazire, Duncan J Watts, Christopher Winship, Robert L Wolpert, Yu Xie, Cristobal Young, Jonathan Zinman, and Valen E Johnson. Redefine statistical significance. Nat Hum Behav, 2(1):6–10, January 2018. URL: http://dx.doi.org/10.1038/s41562-017-0189-z.

133

Norbert L Kerr. HARKing: hypothesizing after the results are known. 1998. URL: http://dx.doi.org/10.1207/s15327957pspr0203_4.

134

Megan L Head, Luke Holman, Rob Lanfear, Andrew T Kahn, and Michael D Jennions. The extent and consequences of p-hacking in science. PLoS Biol., 13(3):e1002106, March 2015. URL: http://dx.doi.org/10.1371/journal.pbio.1002106.

135

R A Fisher and F Yates. Statistical Methods, Experimental Design, and Scientific Inference: A Re-Issue of Statistical Methods for Research Workers, the Design of Experiments, and Statistical Methods and Scientific Inference. OUP Oxford, April 1990. URL: https://www.amazon.co.uk/Statistical-Methods-Experimental-Scientific-Inference/dp/0198522290.

136

Olive Jean Dunn. Estimation of the means of dependent variables. The Annals of Mathematical Statistics, 29(4):1095–1111, 1958. URL: http://dx.doi.org/10.1214/aoms/1177706443.

137

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995. URL: http://dx.doi.org/10.1111/j.2517-6161.1995.tb02031.x.

138

R Silberzahn, E L Uhlmann, D P Martin, P Anselmi, F Aust, E Awtrey, Š Bahník, F Bai, C Bannard, E Bonnier, R Carlsson, F Cheung, G Christensen, R Clay, M A Craig, A Dalla Rosa, L Dam, M H Evans, I Flores Cervantes, N Fong, M Gamez-Djokic, A Glenz, S Gordon-McKeon, T J Heaton, K Hederos, M Heene, A J Hofelich Mohr, F Högden, K Hui, M Johannesson, J Kalodimos, E Kaszubowski, D M Kennedy, R Lei, T A Lindsay, S Liverani, C R Madan, D Molden, E Molleman, R D Morey, L B Mulder, B R Nijstad, N G Pope, B Pope, J M Prenoveau, F Rink, E Robusto, H Roderique, A Sandberg, E Schlüter, F D Schönbrodt, M F Sherman, S A Sommer, K Sotak, S Spain, C Spörlein, T Stafford, L Stefanutti, S Tauber, J Ullrich, M Vianello, E-J Wagenmakers, M Witkowiak, S Yoon, and B A Nosek. Many analysts, one data set: making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3):337–356, September 2018. URL: https://doi.org/10.1177/2515245917747646.

139

Rotem Botvinik-Nezer, Felix Holzmeister, Colin F Camerer, Anna Dreber, Juergen Huber, Magnus Johannesson, Michael Kirchler, Roni Iwanir, Jeanette A Mumford, R Alison Adcock, Paolo Avesani, Blazej M Baczkowski, Aahana Bajracharya, Leah Bakst, Sheryl Ball, Marco Barilari, Nadège Bault, Derek Beaton, Julia Beitner, Roland G Benoit, Ruud M W J Berkers, Jamil P Bhanji, Bharat B Biswal, Sebastian Bobadilla-Suarez, Tiago Bortolini, Katherine L Bottenhorn, Alexander Bowring, Senne Braem, Hayley R Brooks, Emily G Brudner, Cristian B Calderon, Julia A Camilleri, Jaime J Castrellon, Luca Cecchetti, Edna C Cieslik, Zachary J Cole, Olivier Collignon, Robert W Cox, William A Cunningham, Stefan Czoschke, Kamalaker Dadi, Charles P Davis, Alberto De Luca, Mauricio R Delgado, Lysia Demetriou, Jeffrey B Dennison, Xin Di, Erin W Dickie, Ekaterina Dobryakova, Claire L Donnat, Juergen Dukart, Niall W Duncan, Joke Durnez, Amr Eed, Simon B Eickhoff, Andrew Erhart, Laura Fontanesi, G Matthew Fricke, Shiguang Fu, Adriana Galván, Remi Gau, Sarah Genon, Tristan Glatard, Enrico Glerean, Jelle J Goeman, Sergej A E Golowin, Carlos González-García, Krzysztof J Gorgolewski, Cheryl L Grady, Mikella A Green, João F Guassi Moreira, Olivia Guest, Shabnam Hakimi, J Paul Hamilton, Roeland Hancock, Giacomo Handjaras, Bronson B Harry, Colin Hawco, Peer Herholz, Gabrielle Herman, Stephan Heunis, Felix Hoffstaedter, Jeremy Hogeveen, Susan Holmes, Chuan-Peng Hu, Scott A Huettel, Matthew E Hughes, Vittorio Iacovella, Alexandru D Iordan, Peder M Isager, Ayse I Isik, Andrew Jahn, Matthew R Johnson, Tom Johnstone, Michael J E Joseph, Anthony C Juliano, Joseph W Kable, Michalis Kassinopoulos, Cemal Koba, Xiang-Zhen Kong, Timothy R Koscik, Nuri Erkut Kucukboyaci, Brice A Kuhl, Sebastian Kupek, Angela R Laird, Claus Lamm, Robert Langner, Nina Lauharatanahirun, Hongmi Lee, Sangil Lee, Alexander Leemans, Andrea Leo, Elise Lesage, Flora Li, Monica Y C Li, Phui Cheng Lim, Evan N Lintz, Schuyler W Liphardt, Annabel B Losecaat Vermeer, Bradley C Love, Michael L Mack, Norberto Malpica, Theo Marins, Camille Maumet, Kelsey McDonald, Joseph T McGuire, Helena Melero, Adriana S Méndez Leal, Benjamin Meyer, Kristin N Meyer, Glad Mihai, Georgios D Mitsis, Jorge Moll, Dylan M Nielson, Gustav Nilsonne, Michael P Notter, Emanuele Olivetti, Adrian I Onicas, Paolo Papale, Kaustubh R Patil, Jonathan E Peelle, Alexandre Pérez, Doris Pischedda, Jean-Baptiste Poline, Yanina Prystauka, Shruti Ray, Patricia A Reuter-Lorenz, Richard C Reynolds, Emiliano Ricciardi, Jenny R Rieck, Anais M Rodriguez-Thompson, Anthony Romyn, Taylor Salo, Gregory R Samanez-Larkin, Emilio Sanz-Morales, Margaret L Schlichting, Douglas H Schultz, Qiang Shen, Margaret A Sheridan, Jennifer A Silvers, Kenny Skagerlund, Alec Smith, David V Smith, Peter Sokol-Hessner, Simon R Steinkamp, Sarah M Tashjian, Bertrand Thirion, John N Thorp, Gustav Tinghög, Loreen Tisdall, Steven H Tompson, Claudio Toro-Serey, Juan Jesus Torre Tresols, Leonardo Tozzi, Vuong Truong, Luca Turella, Anna E van 't Veer, Tom Verguts, Jean M Vettel, Sagana Vijayarajah, Khoi Vo, Matthew B Wall, Wouter D Weeda, Susanne Weis, David J White, David Wisniewski, Alba Xifra-Porxas, Emily A Yearling, Sangsuk Yoon, Rui Yuan, Kenneth S L Yuen, Lei Zhang, Xu Zhang, Joshua E Zosky, Thomas E Nichols, Russell A Poldrack, and Tom Schonberg. Variability in the analysis of a single neuroimaging dataset by many teams. Nature, 582(7810):84–88, June 2020. URL: http://dx.doi.org/10.1038/s41586-020-2314-9.

140

Lauren Cadwallader, Jason A Papin, Feilim Mac Gabhann, and Rebecca Kirk. Collaborating with our community to increase code sharing. PLoS Comput. Biol., 17(3):e1008867, March 2021. URL: http://dx.doi.org/10.1371/journal.pcbi.1008867.

141

Mark D Wilkinson, Michel Dumontier, I Jsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E Bourne, Jildau Bouwman, Anthony J Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J G Gray, Paul Groth, Carole Goble, Jeffrey S Grethe, Jaap Heringa, Peter A C 't Hoen, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott J Lusher, Maryann E Martone, Albert Mons, Abel L Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna-Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A Swertz, Mark Thompson, Johan van der Lei, Erik van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, and Barend Mons. The FAIR guiding principles for scientific data management and stewardship. Sci Data, 3:160018, March 2016. URL: http://dx.doi.org/10.1038/sdata.2016.18.

142

Anna-Lena Lamprecht, Leyla Garcia, Mateusz Kuzak, Carlos Martinez, Ricardo Arcila, Eva Martin Del Pico, Victoria Dominguez Del Angel, Stephanie Van De Sandt, Jon Ison, Paula Andrea Martinez, and Others. Towards FAIR principles for research software. Data Science, 3(1):37–59, 2020. URL: https://content.iospress.com/articles/data-science/ds190026.

143

SLOW-SCIENCE.org — bear with us, while we think. http://slow-science.org/. Accessed: 2021-2-14. URL: http://slow-science.org/.

144

Hai Fang, Matt E Oates, Ralph B Pethica, Jenny M Greenwood, Adam J Sardar, Owen J L Rackham, Philip C J Donoghue, Alexandros Stamatakis, David A de Lima Morais, and Julian Gough. A daily-updated tree of (sequenced) life as a reference for genome research. Sci. Rep., 3:2015, 2013. URL: http://dx.doi.org/10.1038/srep02015.

145

Elias Dohmen, Lukas P M Kremer, Erich Bornberg-Bauer, and Carsten Kemena. DOGMA: domain-based transcriptome and proteome quality assessment. Bioinformatics, 32(17):2577–2581, September 2016. URL: http://dx.doi.org/10.1093/bioinformatics/btw231.

146

Joel Cracraft and Michael J Donoghue. Assembling the Tree of Life. Oxford University Press, July 2004. URL: https://www.amazon.co.uk/Assembling-Tree-Life-Joel-Cracraft/dp/0195172345.

147

Iupac-Iub Comm on Biochem Nomencl and Iupac-Iub Comm on. A one-letter notation for amino acid sequences. tentative rules. 1968. URL: http://dx.doi.org/10.1021/bi00848a001.

148

NCBI Resource Coordinators and NCBI Resource Coordinators. Database resources of the national center for biotechnology information. 2017. URL: http://dx.doi.org/10.1093/nar/gkw1071.

149

G Parra, K Bradnam, and I Korf. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. 2007. URL: http://dx.doi.org/10.1093/bioinformatics/btm071.

150

Eugene V Koonin, Natalie D Fedorova, John D Jackson, Aviva R Jacobs, Dmitri M Krylov, Kira S Makarova, Raja Mazumder, Sergei L Mekhedov, Anastasia N Nikolskaya, B Sridhar Rao, Igor B Rogozin, Sergei Smirnov, Alexander V Sorokin, Alexander V Sverdlov, Sona Vasudevan, Yuri I Wolf, Jodie J Yin, and Darren A Natale. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol., 5(2):R7, January 2004. URL: http://dx.doi.org/10.1186/gb-2004-5-2-r7.

151

Felipe A Simão, Robert M Waterhouse, Panagiotis Ioannidis, Evgenia V Kriventseva, and Evgeny M Zdobnov. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 31(19):3210–3212, October 2015. URL: http://dx.doi.org/10.1093/bioinformatics/btv351.

152

Roman L Tatusov, Natalie D Fedorova, John D Jackson, Aviva R Jacobs, Boris Kiryutin, Eugene V Koonin, Dmitri M Krylov, Raja Mazumder, Sergei L Mekhedov, Anastasia N Nikolskaya, B Sridhar Rao, Sergei Smirnov, Alexander V Sverdlov, Sona Vasudevan, Yuri I Wolf, Jodie J Yin, and Darren A Natale. The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 4:41, September 2003. URL: http://dx.doi.org/10.1186/1471-2105-4-41.

153

Evgenia V Kriventseva, Dmitry Kuznetsov, Fredrik Tegenfeldt, Mosè Manni, Renata Dias, Felipe A Simão, and Evgeny M Zdobnov. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res., 47(D1):D807–D811, January 2019. URL: http://dx.doi.org/10.1093/nar/gky1053.

154

Paul A Kitts, Deanna M Church, Françoise Thibaud-Nissen, Jinna Choi, Vichet Hem, Victor Sapojnikov, Robert G Smith, Tatiana Tatusova, Charlie Xiang, Andrey Zherikov, Michael DiCuccio, Terence D Murphy, Kim D Pruitt, and Avi Kimchi. Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res., 44(D1):D73–80, January 2016. URL: http://dx.doi.org/10.1093/nar/gkv1226.

155

David Sims, Ian Sudbery, Nicholas E Ilott, Andreas Heger, and Chris P Ponting. Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet., 15(2):121–132, February 2014. URL: http://dx.doi.org/10.1038/nrg3642.

156

Mick Watson and Amanda Warr. Errors in long-read assemblies can critically affect protein prediction. Nat. Biotechnol., 37(2):124–126, February 2019. URL: http://dx.doi.org/10.1038/s41587-018-0004-z.

157

Yehudit Hasin, Marcus Seldin, and Aldons Lusis. Multi-omics approaches to disease. Genome Biol., 2017. URL: http://dx.doi.org/10.1186/s13059-017-1215-1.

158

Marylyn D Ritchie, Emily R Holzinger, Ruowang Li, Sarah A Pendergrass, and Dokyoon Kim. Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet., 16(2):85–97, February 2015. URL: http://dx.doi.org/10.1038/nrg3868.

159

Vessela N Kristensen, Ole Christian Lingjærde, Hege G Russnes, Hans Kristian M Vollan, Arnoldo Frigessi, and Anne-Lise Børresen-Dale. Principles and methods of integrative genomic analyses in cancer. Nat. Rev. Cancer, 14(5):299–313, May 2014. URL: http://dx.doi.org/10.1038/nrc3721.

160

Asif Javed, Saloni Agrawal, and Pauline C Ng. Phen-Gen: combining phenotype and genotype to analyze rare disorders. Nat. Methods, 11(9):935–937, September 2014. URL: http://dx.doi.org/10.1038/nmeth.3046.

161

Damian Smedley, Anika Oellrich, Sebastian Köhler, Barbara Ruef, Sanger Mouse Genetics Project, Monte Westerfield, Peter Robinson, Suzanna Lewis, and Christopher Mungall. PhenoDigm: analyzing curated annotations to associate animal models with human diseases. Database, 2013:bat025, May 2013. URL: http://dx.doi.org/10.1093/database/bat025.

162

Peter N Robinson, Sebastian Köhler, Anika Oellrich, Sanger Mouse Genetics Project, Kai Wang, Christopher J Mungall, Suzanna E Lewis, Nicole Washington, Sebastian Bauer, Dominik Seelow, Peter Krawitz, Christian Gilissen, Melissa Haendel, and Damian Smedley. Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res., 24(2):340–348, February 2014. URL: http://dx.doi.org/10.1101/gr.160325.113.

163

Martin Kircher, Daniela M Witten, Preti Jain, Brian J O'Roak, Gregory M Cooper, and Jay Shendure. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet., 46(3):310–315, March 2014. URL: http://dx.doi.org/10.1038/ng.2892.

164

Damian Smedley, Julius O B Jacobsen, Marten Jäger, Sebastian Köhler, Manuel Holtgrewe, Max Schubach, Enrico Siragusa, Tomasz Zemojtel, Orion J Buske, Nicole L Washington, William P Bone, Melissa A Haendel, and Peter N Robinson. Next-generation diagnostics and disease-gene discovery with the exomiser. Nat. Protoc., 10(12):2004–2015, November 2015. URL: https://www.nature.com/articles/nprot.2015.124.

165

Hui Yang, Peter N Robinson, and Kai Wang. Phenolyzer: phenotype-based prioritization of candidate genes for human diseases. Nat. Methods, 12(9):841–843, September 2015. URL: http://dx.doi.org/10.1038/nmeth.3484.

166

Damian Smedley, Max Schubach, Julius O B Jacobsen, Sebastian Köhler, Tomasz Zemojtel, Malte Spielmann, Marten Jäger, Harry Hochheiser, Nicole L Washington, Julie A McMurry, Melissa A Haendel, Christopher J Mungall, Suzanna E Lewis, Tudor Groza, Giorgio Valentini, and Peter N Robinson. A Whole-Genome analysis framework for effective identification of pathogenic regulatory variants in mendelian disease. Am. J. Hum. Genet., 99(3):595–606, September 2016. URL: http://dx.doi.org/10.1016/j.ajhg.2016.07.005.

167

Tomasz Zemojtel, Sebastian Köhler, Luisa Mackenroth, Marten Jäger, Jochen Hecht, Peter Krawitz, Luitgard Graul-Neumann, Sandra Doelken, Nadja Ehmke, Malte Spielmann, Nancy Christine Oien, Michal R Schweiger, Ulrike Krüger, Götz Frommer, Björn Fischer, Uwe Kornak, Ricarda Flöttmann, Amin Ardeshirdavani, Yves Moreau, Suzanna E Lewis, Melissa Haendel, Damian Smedley, Denise Horn, Stefan Mundlos, and Peter N Robinson. Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome. Sci. Transl. Med., 6(252):252ra123, September 2014. URL: http://dx.doi.org/10.1126/scitranslmed.3009262.

168

Sebastian Köhler, Marcel H Schulz, Peter Krawitz, Sebastian Bauer, Sandra Dölken, Claus E Ott, Christine Mundlos, Denise Horn, Stefan Mundlos, and Peter N Robinson. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am. J. Hum. Genet., 85(4):457–464, October 2009. URL: http://dx.doi.org/10.1016/j.ajhg.2009.09.003.

169

Peristera Paschou, Elad Ziv, Esteban G Burchard, Shweta Choudhry, William Rodriguez-Cintron, Michael W Mahoney, and Petros Drineas. PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genet., 3(9):1672–1686, September 2007. URL: http://dx.doi.org/10.1371/journal.pgen.0030160.

170

M B Eisen, P T Spellman, P O Brown, and D Botstein. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U. S. A., 95(25):14863–14868, December 1998. URL: http://dx.doi.org/10.1073/pnas.95.25.14863.

171

Habtom W Ressom, Rency S Varghese, Zhen Zhang, Jianhua Xuan, and Robert Clarke. Classification algorithms for phenotype prediction in genomics and proteomics. Front. Biosci., 13:691–708, January 2008. URL: http://dx.doi.org/10.2741/2712.

172

Michael H Cho, George R Washko, Thomas J Hoffmann, Gerard J Criner, Eric A Hoffman, Fernando J Martinez, Nan Laird, John J Reilly, and Edwin K Silverman. Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation. Respir. Res., 11:30, March 2010. URL: http://dx.doi.org/10.1186/1465-9921-11-30.

173

Richard Bellman. Dynamic Programming. Princeton University Press, 1957. URL: https://play.google.com/store/books/details?id=wdtoPwAACAAJ.

174

A Zimek, E Schubert, and H P Kriegel. A survey on unsupervised outlier detection in high‐dimensional numerical data. Stat. Anal. Data Min., 2012. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/sam.11161.

175

The 1000 Genomes Project Consortium and The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature, 526(7571):68–74, 2015. URL: http://dx.doi.org/10.1038/nature15393.

176

Cynthia L Smith, Carroll-Ann W Goldsmith, and Janan T Eppig. The mammalian phenotype ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol., 6(1):R7, 2005. URL: http://dx.doi.org/10.1186/gb-2004-6-1-r7.

177

P N Robinson and S Mundlos. The human phenotype ontology. Clin. Genet., 77(6):525–534, 2010. URL: http://dx.doi.org/10.1111/j.1399-0004.2010.01436.x.

178

Lynn Marie Schriml, Cesar Arze, Suvarna Nadendla, Yu-Wei Wayne Chang, Mark Mazaitis, Victor Felix, Gang Feng, and Warren Alden Kibbe. Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res., 40(Database issue):D940–6, January 2012. URL: http://dx.doi.org/10.1093/nar/gkr972.

179

Alex Bateman, Ewan Birney, Lorenzo Cerruti, Richard Durbin, Laurence Etwiller, Sean R Eddy, Sam Griffiths-Jones, Kevin L Howe, Mhairi Marshall, and Erik L L Sonnhammer. The pfam protein families database. Nucleic Acids Res., 30(1):276–280, January 2002. URL: http://dx.doi.org/10.1093/nar/30.1.276.

180

Jan Poland and Thomas Zeugmann. Clustering pairwise distances with missing data: maximum cuts versus normalized cuts. Discovery Science, pages 197–208, 2006. URL: http://dx.doi.org/10.1007/11893318_21.

181

Barry Smith, Werner Ceusters, Bert Klagges, Jacob Köhler, Anand Kumar, Jane Lomax, Chris Mungall, Fabian Neuhaus, Alan L Rector, and Cornelius Rosse. Relations in biomedical ontologies. Genome Biol., 6(5):R46, April 2005. URL: http://dx.doi.org/10.1186/gb-2005-6-5-r46.

182

H J Lowe and G O Barnett. Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. JAMA, 271(14):1103–1108, April 1994. URL: https://www.ncbi.nlm.nih.gov/pubmed/8151853.

183

1000 Genomes Project Consortium, Goncalo R Abecasis, Adam Auton, Lisa D Brooks, Mark A DePristo, Richard M Durbin, Robert E Handsaker, Hyun Min Kang, Gabor T Marth, and Gil A McVean. An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422):56–65, November 2012. URL: http://dx.doi.org/10.1038/nature11632.

184

Susan Fairley, Ernesto Lowy-Gallego, Emily Perry, and Paul Flicek. The international genome sample resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res., 48(D1):D941–D947, January 2020. URL: http://dx.doi.org/10.1093/nar/gkz836.

185

Heng Li. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics, 27(5):718–719, March 2011. URL: http://dx.doi.org/10.1093/bioinformatics/btq671.

186

Cath Tyner. The UCSC genome browser coordinate counting systems. http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/, December 2016. Accessed: 2021-2-7. URL: http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/.

187

G M Church. The personal genome project. Mol. Syst. Biol., 1:2005.0030, December 2005. URL: http://dx.doi.org/10.1038/msb4100040.

188

Philipp G Sand. A lesson not learned: allele misassignment. Behav. Brain Funct., 3(1):65, 2007. URL: http://dx.doi.org/10.1186/1744-9081-3-65.

189

2.3. clustering — scikit-learn 0.21.2 documentation. https://scikit-learn.org/stable/modules/clustering.html. Accessed: 2019-6-9. URL: https://scikit-learn.org/stable/modules/clustering.html.

190

Lucien Marie Le Cam and Jerzy Neyman. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, 1967. URL: https://play.google.com/store/books/details?id=EN6isXsFdKgC.

191

Chenguang Zhao and Zheng Wang. GOGO: an improved algorithm to measure the semantic similarity between gene ontology terms. Sci. Rep., 2018. URL: http://dx.doi.org/10.1038/s41598-018-33219-y.

192

James Z Wang, Zhidian Du, Rapeeporn Payattakool, Philip S Yu, and Chin-Fu Chen. A new method to measure the semantic similarity of GO terms. Bioinformatics, 23(10):1274–1281, May 2007. URL: http://dx.doi.org/10.1093/bioinformatics/btm087.

193

Mark F Rogers, Hashem A Shihab, Matthew Mort, David N Cooper, Tom R Gaunt, and Colin Campbell. FATHMM-XF: accurate prediction of pathogenic point mutations via extended features. Bioinformatics, 34(3):511–513, February 2018. URL: http://dx.doi.org/10.1093/bioinformatics/btx536.

194

Alice B Popejoy and Stephanie M Fullerton. Genomics is failing on diversity. Nature, 538(7624):161–164, October 2016. URL: http://dx.doi.org/10.1038/538161a.

195

Momoko Horikoshi, Hanieh Yaghootkar, Dennis O Mook-Kanamori, Ulla Sovio, H Rob Taal, Branwen J Hennig, Jonathan P Bradfield, Beate St Pourcain, David M Evans, Pimphen Charoen, Marika Kaakinen, Diana L Cousminer, Terho Lehtimäki, Eskil Kreiner-Møller, Nicole M Warrington, Mariona Bustamante, Bjarke Feenstra, Diane J Berry, Elisabeth Thiering, Thiemo Pfab, Sheila J Barton, Beverley M Shields, Marjan Kerkhof, Elisabeth M van Leeuwen, Anthony J Fulford, Zoltán Kutalik, Jing Hua Zhao, Marcel den Hoed, Anubha Mahajan, Virpi Lindi, Liang-Kee Goh, Jouke-Jan Hottenga, Ying Wu, Olli T Raitakari, Marie N Harder, Aline Meirhaeghe, Ioanna Ntalla, Rany M Salem, Karen A Jameson, Kaixin Zhou, Dorota M Monies, Vasiliki Lagou, Mirna Kirin, Jani Heikkinen, Linda S Adair, Fowzan S Alkuraya, Ali Al-Odaib, Philippe Amouyel, Ehm Astrid Andersson, Amanda J Bennett, Alexandra I F Blakemore, Jessica L Buxton, Jean Dallongeville, Shikta Das, Eco J C de Geus, Xavier Estivill, Claudia Flexeder, Philippe Froguel, Frank Geller, Keith M Godfrey, Frédéric Gottrand, Christopher J Groves, Torben Hansen, Joel N Hirschhorn, Albert Hofman, Mads V Hollegaard, David M Hougaard, Elina Hyppönen, Hazel M Inskip, Aaron Isaacs, Torben Jørgensen, Christina Kanaka-Gantenbein, John P Kemp, Wieland Kiess, Tuomas O Kilpeläinen, Norman Klopp, Bridget A Knight, Christopher W Kuzawa, George McMahon, John P Newnham, Harri Niinikoski, Ben A Oostra, Louise Pedersen, Dirkje S Postma, Susan M Ring, Fernando Rivadeneira, Neil R Robertson, Sylvain Sebert, Olli Simell, Torsten Slowinski, Carla M T Tiesler, Anke Tönjes, Allan Vaag, Jorma S Viikari, Jacqueline M Vink, Nadja Hawwa Vissing, Nicholas J Wareham, Gonneke Willemsen, Daniel R Witte, Haitao Zhang, Jianhua Zhao, Meta-Analyses of Glucose- and Insulin-related traits Consortium (MAGIC), James F Wilson, Michael Stumvoll, Andrew M Prentice, Brian F Meyer, Ewan R Pearson, Colin A G Boreham, Cyrus Cooper, Matthew W Gillman, George V Dedoussis, Luis A Moreno, Oluf Pedersen, Maiju Saarinen, Karen L Mohlke, Dorret I Boomsma, Seang-Mei Saw, Timo A Lakka, Antje Körner, Ruth J F Loos, Ken K Ong, Peter Vollenweider, Cornelia M van Duijn, Gerard H Koppelman, Andrew T Hattersley, John W Holloway, Berthold Hocher, Joachim Heinrich, Chris Power, Mads Melbye, Mònica Guxens, Craig E Pennell, Klaus Bønnelykke, Hans Bisgaard, Johan G Eriksson, Elisabeth Widén, Hakon Hakonarson, André G Uitterlinden, Anneli Pouta, Debbie A Lawlor, George Davey Smith, Timothy M Frayling, Mark I McCarthy, Struan F A Grant, Vincent W V Jaddoe, Marjo-Riitta Jarvelin, Nicholas J Timpson, Inga Prokopenko, Rachel M Freathy, and Early Growth Genetics (EGG) Consortium. New loci associated with birth weight identify genetic links between intrauterine growth and adult height and metabolism. Nat. Genet., 45(1):76–82, January 2013. URL: http://dx.doi.org/10.1038/ng.2477.

196

David A Hinds, George McMahon, Amy K Kiefer, Chuong B Do, Nicholas Eriksson, David M Evans, Beate St Pourcain, Susan M Ring, Joanna L Mountain, Uta Francke, George Davey-Smith, Nicholas J Timpson, and Joyce Y Tung. A genome-wide association meta-analysis of self-reported allergy identifies shared and allergy-specific susceptibility loci. Nat. Genet., 45(8):907–911, August 2013. URL: http://dx.doi.org/10.1038/ng.2686.

197

Elise B Robinson, Beate St Pourcain, Verneri Anttila, Jack A Kosmicki, Brendan Bulik-Sullivan, Jakob Grove, Julian Maller, Kaitlin E Samocha, Stephan J Sanders, Stephan Ripke, Joanna Martin, Mads V Hollegaard, Thomas Werge, David M Hougaard, iPSYCH-SSI-Broad Autism Group, Benjamin M Neale, David M Evans, David Skuse, Preben Bo Mortensen, Anders D Børglum, Angelica Ronald, George Davey Smith, and Mark J Daly. Genetic risk for autism spectrum disorders and neuropsychiatric variation in the general population. Nat. Genet., 48(5):552–555, May 2016. URL: http://dx.doi.org/10.1038/ng.3529.

198

Beate St Pourcain, C M A Haworth, O S P Davis, Kai Wang, Nicholas J Timpson, David M Evans, John P Kemp, Angelica Ronald, Tom Price, Emma Meaburn, Susan M Ring, Jean Golding, Hakon Hakonarson, R Plomin, and George Davey Smith. Heritability and genome-wide analyses of problematic peer relationships during childhood and adolescence. Hum. Genet., 134(6):539–551, June 2015. URL: http://dx.doi.org/10.1007/s00439-014-1514-5.

199

Emmanouela Repapi, Ian Sayers, Louise V Wain, Paul R Burton, Toby Johnson, Ma'en Obeidat, Jing Hua Zhao, Adaikalavan Ramasamy, Guangju Zhai, Veronique Vitart, Jennifer E Huffman, Wilmar Igl, Eva Albrecht, Panos Deloukas, John Henderson, Raquel Granell, Wendy L McArdle, Alicja R Rudnicka, Wellcome Trust Case Control Consortium, Inês Barroso, Ruth J F Loos, Nicholas J Wareham, Linda Mustelin, Taina Rantanen, Ida Surakka, Medea Imboden, H Erich Wichmann, Ivica Grkovic, Stipan Jankovic, Lina Zgaga, Anna-Liisa Hartikainen, Leena Peltonen, Ulf Gyllensten, Asa Johansson, Ghazal Zaboli, Harry Campbell, Sarah H Wild, James F Wilson, Sven Gläser, Georg Homuth, Henry Völzke, Massimo Mangino, Nicole Soranzo, Tim D Spector, Ozren Polasek, Igor Rudan, Alan F Wright, Markku Heliövaara, Samuli Ripatti, Anneli Pouta, Asa Torinsson Naluai, Anna-Carin Olin, Kjell Torén, Matthew N Cooper, Alan L James, Lyle J Palmer, Aroon D Hingorani, S Goya Wannamethee, Peter H Whincup, George Davey Smith, Shah Ebrahim, Tricia M McKeever, Ian D Pavord, Andrew K MacLeod, Andrew D Morris, David J Porteous, Cyrus Cooper, Elaine Dennison, Seif Shaheen, Stefan Karrasch, Eva Schnabel, Holger Schulz, Harald Grallert, Nabila Bouatia-Naji, Jérôme Delplanque, Philippe Froguel, John D Blakey, NSHD Respiratory Study Team, John R Britton, Richard W Morris, John W Holloway, Debbie A Lawlor, Jennie Hui, Fredrik Nyberg, Marjo-Riitta Jarvelin, Cathy Jackson, Mika Kähönen, Jaakko Kaprio, Nicole M Probst-Hensch, Beate Koch, Caroline Hayward, David M Evans, Paul Elliott, David P Strachan, Ian P Hall, and Martin D Tobin. Genome-wide association study identifies five loci associated with lung function. Nat. Genet., 42(1):36–44, January 2010. URL: http://dx.doi.org/10.1038/ng.501.

200

Naomi R Wray, Jian Yang, Ben J Hayes, Alkes L Price, Michael E Goddard, and Peter M Visscher. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet., 14(7):507–515, July 2013. URL: http://dx.doi.org/10.1038/nrg3457.

201

Kasper Lage, Niclas Tue Hansen, E Olof Karlberg, Aron C Eklund, Francisco S Roque, Patricia K Donahoe, Zoltan Szallasi, Thomas Skøt Jensen, and Søren Brunak. A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes. Proc. Natl. Acad. Sci. U. S. A., 105(52):20870–20875, December 2008. URL: http://dx.doi.org/10.1073/pnas.0810772105.

202

Eitan E Winter, Leo Goodstadt, and Chris P Ponting. Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res., 14(1):54–61, January 2004. URL: http://dx.doi.org/10.1101/gr.1924004.

203

Owen J L Rackham, Hashem A Shihab, Michael R Johnson, and Enrico Petretto. EvoTol: a protein-sequence based evolutionary intolerance framework for disease-gene prioritization. Nucleic Acids Res., 43(5):e33, March 2015. URL: http://dx.doi.org/10.1093/nar/gku1322.

204

Agne Antanaviciute, Catherine Daly, Laura A Crinnion, Alexander F Markham, Christopher M Watson, David T Bonthron, and Ian M Carr. GeneTIER: prioritization of candidate disease genes using tissue-specific gene expression profiles. Bioinformatics, 31(16):2728–2735, August 2015. URL: http://dx.doi.org/10.1093/bioinformatics/btv196.

205

Arjun Raj and Alexander van Oudenaarden. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell, 135(2):216–226, October 2008. URL: http://dx.doi.org/10.1016/j.cell.2008.09.050.

206

Jong Kyoung Kim, Aleksandra A Kolodziejczyk, Tomislav Ilicic, Sarah A Teichmann, and John C Marioni. Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression. Nat. Commun., 6:8687, October 2015. URL: http://dx.doi.org/10.1038/ncomms9687.

207

Nils Eling, Michael D Morgan, and John C Marioni. Challenges in measuring and understanding biological noise. Nat. Rev. Genet., 20(9):536–548, September 2019. URL: http://dx.doi.org/10.1038/s41576-019-0130-6.

208

Simon Anders, Davis J McCarthy, Yunshun Chen, Michal Okoniewski, Gordon K Smyth, Wolfgang Huber, and Mark D Robinson. Count-based differential expression analysis of RNA sequencing data using R and bioconductor. Nat. Protoc., 8(9):1765–1786, September 2013. URL: http://dx.doi.org/10.1038/nprot.2013.099.

209

Michael I Love, Simon Anders, and Wolfgang Huber. DESeq2 vignette: analyzing RNA-seq data with DESeq2. http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html, February 2021. Accessed: 2021-3-21. URL: http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html.

210

Ana Conesa, Pedro Madrigal, Sonia Tarazona, David Gomez-Cabrero, Alejandra Cervera, Andrew McPherson, Michał Wojciech Szcześniak, Daniel J Gaffney, Laura L Elo, Xuegong Zhang, and Ali Mortazavi. A survey of best practices for RNA-seq data analysis. Genome Biol., 17:13, January 2016. URL: http://dx.doi.org/10.1186/s13059-016-0881-8.

211

Anton Pottegård, Maija Bruun Haastrup, Tore Bjerregaard Stage, Morten Rix Hansen, Kasper Søltoft Larsen, Peter Martin Meegaard, Line Haugaard Vrdlovec Meegaard, Henrik Horneberg, Charlotte Gils, Dorthe Dideriksen, Lise Aagaard, Anna Birna Almarsdottir, Jesper Hallas, and Per Damkier. SearCh for humouristic and extravagant acronyms and thoroughly inappropriate names for important clinical trials (SCIENTIFIC): qualitative and quantitative systematic study. BMJ, 349:g7092, December 2014. URL: http://dx.doi.org/10.1136/bmj.g7092.

212

Imad Abugessaisa, Shuhei Noguchi, Akira Hasegawa, Jayson Harshbarger, Atsushi Kondo, Marina Lizio, Jessica Severin, Piero Carninci, Hideya Kawaji, and Takeya Kasukawa. FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies. Sci Data, 4:170107, August 2017. URL: http://dx.doi.org/10.1038/sdata.2017.107.

213

Alexandra Witze. Wealthy funder pays reparations for use of HeLa cells. Nature, 587(7832):20–21, November 2020. URL: http://dx.doi.org/10.1038/d41586-020-03042-5.

214

Danielle M Pastor, Lisa S Poritz, Thomas L Olson, Christina L Kline, Leonard R Harris, Walter A Koltun, Vernon M Chinchilli, and Rosalyn B Irby. Primary cell lines: false representation or model system? a comparison of four human colorectal tumors and their coordinately established cell lines. Int. J. Clin. Exp. Med., 3(1):69–83, February 2010. URL: https://www.ncbi.nlm.nih.gov/pubmed/20369042.

215

Gurvinder Kaur and Jannette M Dufour. Cell lines: valuable tools or useless artifacts. Spermatogenesis, 2(1):1–5, January 2012. URL: http://dx.doi.org/10.4161/spmg.19885.

216

Graham Bell. Replicates and repeats. BMC Biol., 14:28, April 2016. URL: http://dx.doi.org/10.1186/s12915-016-0254-5.

217

Peter J A Cock, Tiago Antao, Jeffrey T Chang, Brad A Chapman, Cymon J Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, and Michiel J L de Hoon. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11):1422–1423, June 2009. URL: http://dx.doi.org/10.1093/bioinformatics/btp163.

218

Sangya Pundir, Maria J Martin, Claire O'Donovan, and UniProt Consortium. UniProt tools. Curr. Protoc. Bioinformatics, 53:1.29.1–1.29.15, March 2016. URL: http://dx.doi.org/10.1002/0471250953.bi0129s53.

219

Hai Fang. dcGOR: an R package for analysing ontologies and protein domain annotations. PLoS Comput. Biol., 10(10):e1003929, October 2014. URL: http://dx.doi.org/10.1371/journal.pcbi.1003929.

220

Fran Supek, Matko Bošnjak, Nives Škunca, and Tomislav Šmuc. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One, 6(7):e21800, July 2011. URL: http://dx.doi.org/10.1371/journal.pone.0021800.

221

Christopher Buccitelli and Matthias Selbach. mRNAs, proteins and the emerging principles of gene expression control. Nat. Rev. Genet., 21(10):630–644, October 2020. URL: http://dx.doi.org/10.1038/s41576-020-0258-4.

222

Goro Terai and Kiyoshi Asai. Improving the prediction accuracy of protein abundance in escherichia coli using mRNA accessibility. Nucleic Acids Res., 48(14):e81, August 2020. URL: http://dx.doi.org/10.1093/nar/gkaa481.

223

D V Klopfenstein, Liangsheng Zhang, Brent S Pedersen, Fidel Ramírez, Alex Warwick Vesztrocy, Aurélien Naldi, Christopher J Mungall, Jeffrey M Yunes, Olga Botvinnik, Mark Weigel, Will Dampier, Christophe Dessimoz, Patrick Flick, and Haibao Tang. GOATOOLS: a python library for gene ontology analyses. Sci. Rep., 8(1):10872, July 2018. URL: http://dx.doi.org/10.1038/s41598-018-28948-z.

224

Edison Ong, Zuoshuang Xiang, Bin Zhao, Yue Liu, Yu Lin, Jie Zheng, Chris Mungall, Mélanie Courtot, Alan Ruttenberg, and Yongqun He. Ontobee: a linked ontology data server to support ontology term dereferencing, linkage, query and integration. Nucleic Acids Res., 45(D1):D347–D352, January 2017. URL: http://dx.doi.org/10.1093/nar/gkw918.

225

Martin Larralde, Alex Henrie, Philipp A., Spencer Mitchell, and Tatsuya Sakaguchi. Althonos/pronto: 2.4.1. February 2021. URL: https://zenodo.org/record/4552164.

226

Andrea C F Albuquerque, Jose L Campos dos Santos, and Alberto N de Castro. OntoBio: a biodiversity domain ontology for amazonian biological collected objects. In 2015 48th Hawaii International Conference on System Sciences, 3770–3779. January 2015. URL: http://dx.doi.org/10.1109/HICSS.2015.453.

227

Steven Bird. NLTK: the natural language toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, 69–72. aclweb.org, 2006. URL: https://www.aclweb.org/anthology/P06-4018.pdf.

228

Charles R. Harris, K. Jarrod Millman, St'efan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fern'andez del R'ıo, Mark Wiebe, Pearu Peterson, Pierre G'erard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. Array programming with NumPy. Nature, 585(7825):357–362, September 2020. URL: https://doi.org/10.1038/s41586-020-2649-2, doi:10.1038/s41586-020-2649-2.

229

The pandas development team. Pandas-dev/pandas: pandas. February 2020. URL: https://doi.org/10.5281/zenodo.3509134, doi:10.5281/zenodo.3509134.

230

Christopher Woods. Developer's guide — MetaWards documentation. https://metawards.org/development.html?highlight=style. Accessed: 2021-5-1. URL: https://metawards.org/development.html?highlight=style.

231

Donald Stufft. Welcome to twine's documentation! — twine 3.4.2.dev1+geff3a45 documentation. https://twine.readthedocs.io/en/latest/, 2019. Accessed: 2021-5-1. URL: https://twine.readthedocs.io/en/latest/.

232

Holger Krekel, Bruno Oliveira, Ronny Pfannschmidt, Floris Bruynooghe, Brianna Laugher, and Florian Bruhin. Pytest x.y. 2004. URL: https://github.com/pytest-dev/pytest.

233

PyData Community. The PyData sphinx theme — PyData sphinx theme documentation. https://pydata-sphinx-theme.readthedocs.io/, 2019. Accessed: 2021-5-1. URL: https://pydata-sphinx-theme.readthedocs.io/.

234

Marina Lizio, Jayson Harshbarger, Hisashi Shimoji, Jessica Severin, Takeya Kasukawa, Serkan Sahin, Imad Abugessaisa, Shiro Fukuda, Fumi Hori, Sachi Ishikawa-Kato, Christopher J Mungall, Erik Arner, J Kenneth Baillie, Nicolas Bertin, Hidemasa Bono, Michiel de Hoon, Alexander D Diehl, Emmanuel Dimont, Tom C Freeman, Kaori Fujieda, Winston Hide, Rajaram Kaliyaperumal, Toshiaki Katayama, Timo Lassmann, Terrence F Meehan, Koro Nishikata, Hiromasa Ono, Michael Rehli, Albin Sandelin, Erik A Schultes, Peter A C 't Hoen, Zuotian Tatum, Mark Thompson, Tetsuro Toyoda, Derek W Wright, Carsten O Daub, Masayoshi Itoh, Piero Carninci, Yoshihide Hayashizaki, Alistair R R Forrest, Hideya Kawaji, and FANTOM consortium. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol., 16:22, January 2015. URL: http://dx.doi.org/10.1186/s13059-014-0560-6.

235

Jean-Baptiste Lamy. Owlready: ontology-oriented programming in python with automatic classification and high level constructs for biomedical ontologies. Artif. Intell. Med., 80:11–28, July 2017. URL: http://dx.doi.org/10.1016/j.artmed.2017.07.002.

236

David Gomez-Cabrero, Imad Abugessaisa, Dieter Maier, Andrew Teschendorff, Matthias Merkenschlager, Andreas Gisel, Esteban Ballestar, Erik Bongcam-Rudloff, Ana Conesa, and Jesper Tegnér. Data integration in the era of omics: current and future challenges. BMC Syst. Biol., 8 Suppl 2:I1, March 2014. URL: http://dx.doi.org/10.1186/1752-0509-8-S2-I1.

237

Gordon Bell, Tony Hey, and Alex Szalay. Computer science. beyond the data deluge. Science, 323(5919):1297–1298, March 2009. URL: http://dx.doi.org/10.1126/science.1170411.

238

S Cortijo, Z Aydin, S Ahnert, and others. Widespread inter‐individual gene expression variability in arabidopsis thaliana. Mol. Syst. Biol., 2019. URL: http://msb.embopress.org/content/15/1/e8591.abstract.

239

Ana Viñuela, Andrew A Brown, Alfonso Buil, Pei-Chien Tsai, Matthew N Davies, Jordana T Bell, Emmanouil T Dermitzakis, Timothy D Spector, and Kerrin S Small. Age-dependent changes in mean and variance of gene expression across tissues in a twin cohort. Human molecular genetics, 27(4):732–741, 2018.

240

Jialiang Yang, Tao Huang, Francesca Petralia, Quan Long, Bin Zhang, Carmen Argmann, Yong Zhao, Charles V Mobbs, Eric E Schadt, Jun Zhu, Zhidong Tu, and GTEx Consortium. Synchronized age-related gene expression changes across multiple tissues in human and the link to complex diseases. Sci. Rep., 5:15145, October 2015. URL: http://dx.doi.org/10.1038/srep15145.

241

B Reinius and E Jazin. Prenatal sex differences in the human brain. Mol. Psychiatry, 14(11):987, 988–9, November 2009. URL: http://dx.doi.org/10.1038/mp.2009.79.

242

Gregory Stone, Ashley Choi, Meritxell Oliva, Joshua Gorham, Mahyar Heydarpour, Christine E Seidman, Jon G Seidman, Sary F Aranki, Simon C Body, Vincent J Carey, Benjamin A Raby, Barbara E Stranger, and Jochen D Muehlschlegel. Sex differences in gene expression in response to ischemia in the human left ventricular myocardium. Hum. Mol. Genet., January 2019. URL: http://dx.doi.org/10.1093/hmg/ddz014.

243

Ueli Schibler. The daily timing of gene expression and physiology in mammals. Dialogues Clin. Neurosci., 9(3):257–272, 2007. URL: https://www.ncbi.nlm.nih.gov/pubmed/17969863.

244

Valentine Svensson, Sarah A Teichmann, and Oliver Stegle. SpatialDE: identification of spatially variable genes. Nat. Methods, 15(5):343–346, May 2018. URL: http://dx.doi.org/10.1038/nmeth.4636.

245

Qingguo Wang, Joshua Armenia, Chao Zhang, Alexander V Penson, Ed Reznik, Liguo Zhang, Thais Minet, Angelica Ochoa, Benjamin E Gross, Christine A Iacobuzio-Donahue, Doron Betel, Barry S Taylor, Jianjiong Gao, and Nikolaus Schultz. Unifying cancer and normal RNA sequencing data from different sources. Sci Data, 5:180061, April 2018. URL: http://dx.doi.org/10.1038/sdata.2018.61.

246

Jeffrey T Leek, Robert B Scharpf, Héctor Corrada Bravo, David Simcha, Benjamin Langmead, W Evan Johnson, Donald Geman, Keith Baggerly, and Rafael A Irizarry. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet., 11(10):733–739, October 2010. URL: http://dx.doi.org/10.1038/nrg2825.

247

Rafael A Irizarry, Daniel Warren, Forrest Spencer, Irene F Kim, Shyam Biswal, Bryan C Frank, Edward Gabrielson, Joe G N Garcia, Joel Geoghegan, Gregory Germino, Constance Griffin, Sara C Hilmer, Eric Hoffman, Anne E Jedlicka, Ernest Kawasaki, Francisco Martínez-Murillo, Laura Morsberger, Hannah Lee, David Petersen, John Quackenbush, Alan Scott, Michael Wilson, Yanqin Yang, Shui Qing Ye, and Wayne Yu. Multiple-laboratory comparison of microarray platforms. Nat. Methods, 2(5):345–350, May 2005. URL: http://dx.doi.org/10.1038/nmeth756.

248

W Evan Johnson, Cheng Li, and Ariel Rabinovic. Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics, 8(1):118–127, January 2007. URL: http://dx.doi.org/10.1093/biostatistics/kxj037.

249

Jeffrey T Leek and John D Storey. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet., 3(9):1724–1735, September 2007. URL: http://dx.doi.org/10.1371/journal.pgen.0030161.

250

Chao Chen, Kay Grennan, Judith Badner, Dandan Zhang, Elliot Gershon, Li Jin, and Chunyu Liu. Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One, 6(2):e17238, February 2011. URL: http://dx.doi.org/10.1371/journal.pone.0017238.

251

Q Liu and M Markatou. Evaluation of methods in removing batch effects on RNA-seq data. Infectious Diseases and Translational Medicine, 2016. URL: http://www.tran-med.com/CN/article/downloadArticleFile.do?attachType=PDF&id=24.

252

Nuno A Fonseca, Robert Petryszak, John Marioni, and Alvis Brazma. iRAP - an integrated RNA-seq analysis pipeline. Preprint, June 2014. URL: http://dx.doi.org/10.1101/005991.

253

Tobias Maier, Marc Güell, and Luis Serrano. Correlation of mRNA and protein in complex biological samples. FEBS Lett., 583(24):3966–3973, December 2009. URL: http://dx.doi.org/10.1016/j.febslet.2009.10.036.

254

Andreas Beyer, Jens Hollunder, Heinz-Peter Nasheuer, and Thomas Wilhelm. Post-transcriptional expression regulation in the yeast saccharomyces cerevisiae on a genomic scale. Mol. Cell. Proteomics, 3(11):1083–1092, November 2004. URL: http://dx.doi.org/10.1074/mcp.M400099-MCP200.

255

Gang Wu, Lei Nie, and Weiwen Zhang. Integrative analyses of posttranscriptional regulation in the yeast saccharomyces cerevisiae using transcriptomic and proteomic data. Curr. Microbiol., 57(1):18–22, July 2008. URL: http://dx.doi.org/10.1007/s00284-008-9145-5.

256

Nancy Yiu-Lin Yu, Björn M Hallström, Linn Fagerberg, Fredrik Ponten, Hideya Kawaji, Piero Carninci, Alistair R R Forrest, Fantom Consortium, Yoshihide Hayashizaki, Mathias Uhlén, and Carsten O Daub. Complementing tissue characterization by integrating transcriptome profiling from the human protein atlas and from the FANTOM5 consortium. Nucleic Acids Res., 43(14):6787–6798, August 2015. URL: http://dx.doi.org/10.1093/nar/gkv608.

257

The human protein atlas. https://www.proteinatlas.org/. Accessed: 2021-1-29. URL: https://www.proteinatlas.org/.

258

Hideya Kawaji, Marina Lizio, Masayoshi Itoh, Mutsumi Kanamori-Katayama, Ai Kaiho, Hiromi Nishiyori-Sueki, Jay W Shin, Miki Kojima-Ishiyama, Mitsuoki Kawano, Mitsuyoshi Murata, Noriko Ninomiya-Fukuda, Sachi Ishikawa-Kato, Sayaka Nagao-Sato, Shohei Noma, Yoshihide Hayashizaki, Alistair R R Forrest, Piero Carninci, and FANTOM Consortium. Comparison of CAGE and RNA-seq transcriptome profiling using clonally amplified and single-molecule next-generation sequencing. Genome Res., 24(4):708–717, April 2014. URL: http://dx.doi.org/10.1101/gr.156232.113.

259

Mathias Uhlen, Per Oksvold, Linn Fagerberg, Emma Lundberg, Kalle Jonasson, Mattias Forsberg, Martin Zwahlen, Caroline Kampf, Kenneth Wester, Sophia Hober, Henrik Wernerus, Lisa Björling, and Fredrik Ponten. Towards a knowledge-based human protein atlas. Nature Biotechnology, 28(12):1248–1250, 2010. URL: http://dx.doi.org/10.1038/nbt1210-1248.

260

Mathias Uhlén, Linn Fagerberg, Björn M Hallström, Cecilia Lindskog, Per Oksvold, Adil Mardinoglu, Åsa Sivertsson, Caroline Kampf, Evelina Sjöstedt, Anna Asplund, Ingmarie Olsson, Karolina Edlund, Emma Lundberg, Sanjay Navani, Cristina Al-Khalili Szigyarto, Jacob Odeberg, Dijana Djureinovic, Jenny Ottosson Takanen, Sophia Hober, Tove Alm, Per-Henrik Edqvist, Holger Berling, Hanna Tegel, Jan Mulder, Johan Rockberg, Peter Nilsson, Jochen M Schwenk, Marica Hamsten, Kalle von Feilitzen, Mattias Forsberg, Lukas Persson, Fredric Johansson, Martin Zwahlen, Gunnar von Heijne, Jens Nielsen, and Fredrik Pontén. Proteomics. tissue-based map of the human proteome. Science, 347(6220):1260419, January 2015. URL: http://dx.doi.org/10.1126/science.1260419.

261

GTEx Consortium. The Genotype-Tissue expression (GTEx) project. Nat. Genet., 45(6):580–585, June 2013. URL: http://dx.doi.org/10.1038/ng.2653.

262

Susan J Lindsay, Yaobo Xu, Steven N Lisgo, Lauren F Harkin, Andrew J Copp, Dianne Gerrelli, Gavin J Clowry, Aysha Talbot, Michael J Keogh, Jonathan Coxhead, Mauro Santibanez-Koref, and Patrick F Chinnery. HDBR expression: a unique resource for global and individual gene expression studies during early human brain development. Front. Neuroanat., 10:86, October 2016. URL: http://dx.doi.org/10.3389/fnana.2016.00086.

263

M Keays. ExpressionAtlas: download datasets from EMBL-EBI expression atlas. R package version 1.10.0. 2018. URL: https://bioconductor.org/packages/release/bioc/html/ExpressionAtlas.html.

264

Damian Smedley, Syed Haider, Benoit Ballester, Richard Holland, Darin London, Gudmundur Thorisson, and Arek Kasprzyk. BioMart–biological queries made easy. BMC Genomics, 10:22, January 2009. URL: http://dx.doi.org/10.1186/1471-2164-10-22.

265

Clarissa M Koch, Stephen F Chiu, Mahzad Akbarpour, Ankit Bharat, Karen M Ridge, Elizabeth T Bartom, and Deborah R Winter. A beginner's guide to analysis of RNA sequencing data. Am. J. Respir. Cell Mol. Biol., 59(2):145–157, August 2018. URL: http://dx.doi.org/10.1165/rcmb.2017-0430TR.

266

Aaron M Newman, Chih Long Liu, Michael R Green, Andrew J Gentles, Weiguo Feng, Yue Xu, Chuong D Hoang, Maximilian Diehn, and Ash A Alizadeh. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods, 12(5):453–457, May 2015. URL: http://dx.doi.org/10.1038/nmeth.3337.

267

Maayan Baron, Adrian Veres, Samuel L Wolock, Aubrey L Faust, Renaud Gaujoux, Amedeo Vetere, Jennifer Hyoje Ryu, Bridget K Wagner, Shai S Shen-Orr, Allon M Klein, Douglas A Melton, and Itai Yanai. A Single-Cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst, 3(4):346–360.e4, October 2016. URL: http://dx.doi.org/10.1016/j.cels.2016.08.011.

268

Xuran Wang, Jihwan Park, Katalin Susztak, Nancy R Zhang, and Mingyao Li. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat. Commun., 10(1):380, January 2019. URL: http://dx.doi.org/10.1038/s41467-018-08023-x.

269

Yuqing Zhang, Giovanni Parmigiani, and W Evan Johnson. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genom Bioinform, 2(3):lqaa078, September 2020. URL: http://dx.doi.org/10.1093/nargab/lqaa078.

270

Laleh Haghverdi, Aaron T L Lun, Michael D Morgan, and John C Marioni. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol., 36(5):421–427, June 2018. URL: http://dx.doi.org/10.1038/nbt.4091.

271

Alyssa C Frazee, Andrew E Jaffe, Ben Langmead, and Jeffrey T Leek. Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics, 31(17):2778–2784, September 2015. URL: http://dx.doi.org/10.1093/bioinformatics/btv272.

272

Jeff Alstott, Ed Bullmore, and Dietmar Plenz. Powerlaw: a python package for analysis of heavy-tailed distributions. PLoS One, 9(1):e85777, January 2014. URL: http://dx.doi.org/10.1371/journal.pone.0085777.