7.4. Example uses: mapping samples to diseases or phenotypes

There are a number of potential uses for Ontolopy. In this section, I show two simple examples to demonstrate this usefulness. These show how Ontolopy can be used to:

  1. Find disease-related samples

  2. Find samples of pluripotent stem cells (cells that can turn into different tissue types)

Then in the next section I give the more detailed and complex example of creating a mapping between samples and tissues (which is what Ontolopy was created for specifically), and how this was used to find inconsistencies in the FANTOM5 data.

7.4.1. Inputs

The examples of using Ontolopy in this Chapter use input files from FANTOM5[234] (for samples) and Uberon[104] (the cross-species anatomy ontology).

import ontolopy as opy
import pandas as pd 
from myst_nb import glue
import time

# Read in files:
# -------------
fantom_obo_file = '../c08-combining/data/experiments/fantom/ff-phase2-170801.obo.txt'
fantom_samples_info_file = '../c08-combining/data/experiments/fantom/fantom_humanSamples2.0.csv'
uberon_obo_file = '../c08-combining/data/uberon_ext_210321.obo' 

# Uberon OBO:
uberon_obo = opy.load_obo(
    file_loc=uberon_obo_file, 
    ont_ids=['GO', 'UBERON','CL'], 
)

# FANTOM OBO:
fantom_obo = opy.load_obo(
    file_loc=fantom_obo_file, 
    ont_ids=['CL', 'FF', 'GO', 'UBERON', 'DOID'],
)

# FANTOM Samples Info file:
fantom_samples_info = pd.read_csv(fantom_samples_info_file, index_col=1)


# Glue Samples Info excerpt:
# --------------------------
indices = [1,9,11]  # choose rows for variety
display(fantom_samples_info.iloc[indices])
glue("fantom-samples-info-excerpt", fantom_samples_info.iloc[indices], display=False)

7.4.1.1. FANTOM5

Large experiments sometimes include an ontology of samples instead of or (more frequently) in addition to a samples information file. The data from the FANTOM5 experiment[234] is one such example of this. I have already explained the FANTOM5 data in more detail but for now the only things we need to keep in mind are that:

  1. The FANTOM5 experiment measures transcript expression in a wide variety of samples, across many tissue and cell types.

  2. FANTOM5 provide an ontology of samples as well as a sample information file (containing short text descriptions of samples).

Source Name Charateristics [description] Characteristics [catalog_id] Characteristics [Category] Chracteristics [Species] Characteristics [Sex] Characteristics [Age] Characteristics [Developmental stage] Characteristics[Tissue] Characteristics [Cell lot] Characteristics [Cell type] Characteristics [Catalogue ID] Characteristics [Collaboration] Characteristics [Provider] Protocol REF Extract Name Material Type
Charateristics [ff_ontology]
FF:10002-101A5 10002-101A5 SABiosciences XpressRef Human Universal Total ... B208251 tissues Human (Homo sapiens) mixed NaN UNDEFINED unclassifiable NaN CELL MIXTURE - tissue sample NaN FANTOM5 OSC CORE (contact: Al Forrest) SABiosciences OP-RNA-extraction-totalRNA-TRIzol-isopropanol-... 10002-101A5 Total RNA
FF:10016-101C7 10016-101C7 heart, adult, pool1 0910061 -7 tissues Human (Homo sapiens) mixed NaN 70,73,74 years old adult heart NaN CELL MIXTURE - tissue sample NaN FANTOM5 OSC CORE (contact: Al Forrest) Ambion OP-RNA-extraction-totalRNA-ToTALLY-RNA-v1.0 10016-101C7 Total RNA
FF:10018-101C9 10018-101C9 liver, adult, pool1 0910061 -9 tissues Human (Homo sapiens) mixed NaN 64,69,70 years old adult liver NaN CELL MIXTURE - tissue sample NaN FANTOM5 OSC CORE (contact: Al Forrest) Ambion OP-RNA-extraction-totalRNA-ToTALLY-RNA-v1.0 10018-101C9 Total RNA

Fig. 7.2 An excerpt of the FANTOM sample info file, showing sources of text-based information, e.g. “heart, adult, pool1” in the Charateristics [description] field, and mapping to ontology term in the index (FF:10016-101C7).

Fig. 7.2 shows an excerpt of the FANTOM Samples Information file. This kind of file is typical of transcription experiments: a csv file containing hand-entered text-based information, using non-specific lay terms for samples e.g. “heart”.

The FANTOM ontology file links specific FANTOM samples to more general types of FANTOM samples and to Uberon tissues and CL cell types.

For example an excerpt of the FANTOM ontology OBO file is:

[Term]
id: FF:0000076
name: hepatic sinusoidal endothelial cell sample
namespace: FANTOM5
synonym: "hepatic sinusoidal endothelial cell" EXACT []
is_a: FF:0000002 ! in vivo cell sample
intersection_of: FF:0000002 ! in vivo cell sample
intersection_of: derives_from CL:1000398 ! endothelial cell of hepatic sinusoid
intersection_of: derives_from UBERON:0001281 ! hepatic sinusoid
relationship: derives_from CL:1000398 ! endothelial cell of hepatic sinusoid
relationship: derives_from UBERON:0001281 ! hepatic sinusoid
created_by: tmeehan
creation_date: 2011-03-01T04:51:50Z

7.4.1.2. Uberon

As I mentioned in Section 3.3.3.2, Uberon is a cross-species anatomy ontology with excellent linkage to other ontologies. As we can see above, the FANTOM5 ontology links FANTOM samples to Uberon. This means that the Uberon[104] extended ontology OBO file can then be used to further link the samples to human disease or gene ontology terms.

For example, here is an excerpt of the Uberon extended OBO file (non-consecutive lines for brevity), showing how the Uberon extended ontology could be used to link a FANTOM sample to a GO term:

[Term]
id: UBERON:0001281
name: hepatic sinusoid
alt_id: UBERON:0003275
def: "Wide thin-walled blood vessels in the liver. In mammals they have neither veinous or arterial markers." [http://en.wikipedia.org/wiki/Hepatic_sinusoid, ZFIN:curator]
synonym: "hepatic sinusoids" RELATED []
synonym: "liver hepatic sinusoids" EXACT [EHDAA2:0000999]
synonym: "liver sinusoid" EXACT []
intersection_of: part_of UBERON:0002107 ! liver
relationship: part_of UBERON:0004647 ! liver lobule
relationship: part_of UBERON:0006877 {source="https://github.com/obophenotype/uberon/wiki/Inferring-part-of-relationships"} ! vasculature of liver
property_value: homology_notes "(...) the amphibian liver has characteristics in common with both fish and terrestrial vertebrates. (...) The histological structure of the liver is similar to that in other vertebrates, with hepatocytes arranged in clusters and cords separated by a meshwork of sinusoids and the presence of the traditional triad of portal venule, hepatic arteriole, and bile duct.[well established][VHOG]" xsd:string {date_retrieved="2012-09-17", external_class="VHOG:0000708", ontology="VHOG", source="http://bgee.unil.ch/", source="DOI:10.1053/ax.2000.7133 Crawshaw GJ, Weinkle TK, Clinical and pathological aspects of the amphibian liver. Seminars in Avian and Exotic Pet Medicine (2000)"}

[Term]
id: UBERON:0002107
name: liver
def: "An exocrine gland which secretes bile and functions in metabolism of protein and carbohydrate and fat, synthesizes substances involved in the clotting of the blood, synthesizes vitamin A, detoxifies poisonous substances, stores glycogen, and breaks down worn-out erythrocytes[GO]." [BTO:0000759, http://en.wikipedia.org/wiki/Liver]
synonym: "iecur" RELATED LATIN [http://en.wikipedia.org/wiki/Liver]
is_a: UBERON:0002365 {source="BTO", source="EHDAA2", source="GO-def"} ! exocrine gland
is_a: UBERON:0004119 ! endoderm-derived structure
is_a: UBERON:0005172 ! abdomen element
is_a: UBERON:0006925 ! digestive system gland
disjoint_from: UBERON:0010264 ! hepatopancreas
relationship: contributes_to_morphology_of UBERON:0002423 ! hepatobiliary system
relationship: produces UBERON:0001970 ! bile
relationship: site_of GO:0002384 ! hepatic immune response
relationship: site_of GO:0005978 ! glycogen biosynthetic process
relationship: site_of GO:0005980 ! glycogen catabolic process
property_value: external_definition "Organ which secretes bile and participates in formation of certain blood proteins.[AAO]" xsd:string {date_retrieved="2012-06-20", external_class="AAO:0010111", ontology="AAO", source="AAO:BJB"}
property_value: function_notes "secretes bile and functions in metabolism of protein and carbohydrate and fat, synthesizes substances involved in the clotting of the blood, synthesizes vitamin A, detoxifies poisonous substances, stores glycogen, and breaks down worn-out erythrocytes[GO]." xsd:string

These excerpts show how FF:0000076 (hepatic sinusoidal endothelial cell samples) are derived_from the hepatic sinusoid which is part_of the liver the site_of hepatic immune response, glycogen biosynthetic process and glycogen catabolic process. There are many such relationships in these files: Ontolopy provides an easy way of extracting these.

Warning: not all relationships are easy to interpret

In this case, we do not have enough information to infer that hepatic sinusoidal endothelial cell samples are a site_of (for example) the hepatic immune response because it could be another, disjoint, part of the liver that is the site of this. We can also not rule it out: a more specific annotation in the future might enable us to find this out with these files.

However, this information could still be useful in Computational Biology. If we don’t know exactly where a process takes place, we may want to cast a wider net and look at all samples which are part of a larger tissue we know exhibits the process we are interested in.

This is something to be aware of in general when using Ontolopy: if you are only interested in straight-forward relationships, then you often need to think carefully about the types of relationships that you ask for: part_of relationships need particular care.

7.4.3. Example 2: Find tissues that are capable of cell differentiation

This second example showcases a different and slightly more complex example where:

  1. We want to look for relations to a specific term rather than a general one: in this case GO:0030154 cell differentiation.

  2. We need to use an external ontology (Uberon), so we use the merge function.

  3. We need to chain two queries and stick them together. The derives_from relation in the context of the FANTOM5 ontology can mean “extracted from” or “extracted from and then do lots of things to it”. To rule out the latter type of samples we only want to ask for in vivo samples (is_a in vivo sample FF:0000002) that derives_from cell types that are capable_of cell differentiation (GO:0030154).

def get_differentiable_samples(samples, ont):
    in_vivo = 'FF:0000002'
    in_vivo_samples = opy.Relations(
        allowed_relations=['is_a'], 
        sources=list(samples), 
        targets=[in_vivo], 
        ont=ont,
    ).dropna(subset=['to'])
    
    differentiable = 'GO:0030154'
    differentiable_samples = opy.Relations(
        allowed_relations=['is_a', 'derives_from', 'capable_of'],
        sources=list(in_vivo_samples.index),
        targets=[differentiable],
        ont=ont,
    )
    return differentiable_samples

# merge ontology:
merged = uberon_obo.merge(fantom_obo)

# get differentiable cell samples:
start = time.time()
differentiable_samples = get_differentiable_samples(fantom_samples_info.index, ont=merged).dropna(subset=['to'])
time_taken = time.time()-start
print(f"Finds {differentiable_samples.shape[0]} relations to cell differentiation in {time_taken:.3f} seconds")
Finds 254 relations to cell differentiation in 0.108 seconds
relation_path relation_text to
from
FF:11214-116A8 FF:11214-116A8.is_a~FF:0000094.derives_from~CL:0 002569.is_a~CL:0000134.is_a~CL:0000048.is_a~CL:0 000034.is_a~CL:0011115.capable_of~GO:0030154 Mesenchymal stem cell - umbilical, donor0 is a human mesenchymal stem cell of umbilical cord- Sciencell sample derives from mesenchymal stem cell of umbilical cord is a mesenchymal stem cell is a multi fate stem cell is a stem cell is a precursor cell capable of cell differentiation GO:0030154
FF:11224-116B9 FF:11224-116B9.is_a~FF:0000024.derives_from~CL:0 000576.is_a~CL:0011026.is_a~CL:0011115.capable_o f~GO:0030154 CD14-positive Monocytes, donor1 is a human CD14-positive monocyte sample derives from monocyte is a progenitor cell is a precursor cell capable of cell differentiation GO:0030154
FF:11227-116C3 FF:11227-116C3.is_a~FF:0000044.derives_from~CL:0 000576.is_a~CL:0011026.is_a~CL:0011115.capable_o f~GO:0030154 Dendritic Cells - monocyte immature derived, donor1, rep1 is a human monocyte immature derived dendritic cell sample derives from monocyte is a progenitor cell is a precursor cell capable of cell differentiation GO:0030154
FF:11229-116C5 FF:11229-116C5.derives_from~CL:0000576.is_a~CL:0 011026.is_a~CL:0011115.capable_of~GO:0030154 CD14+ monocyte derived endothelial progenitor cells, donor1 derives from monocyte is a progenitor cell is a precursor cell capable of cell differentiation GO:0030154
FF:11240-116D7 FF:11240-116D7.is_a~FF:0000165.derives_from~CL:0 000594.is_a~CL:0000680.is_a~CL:0000055.is_a~CL:0 011115.capable_of~GO:0030154 Skeletal Muscle Satellite Cells, donor1 is a human skeletal muscle satellite cell sample derives from skeletal muscle satellite cell is a muscle precursor cell is a non-terminally differentiated cell is a precursor cell capable of cell differentiation GO:0030154

Fig. 7.4 An excerpt of the output of Ontolopy’s found FANTOM samples that are or derive from cells that are capable_of cell differentiation (GO:0030154).

Again Ontolopy can retrieve this information compactly (2 lines of code), and in less than half a second. An excerpt of the output is shown in Fig. 7.4. This would be useful if we wanted to look at expression in tissues that are capable of cell differentiation, for example.