7.2. Functionality

This section describes the low-level functionality of Ontolopy: what it can do. For examples of how this functionality is of practical use, please see the examples sections. You can find a full and up-to-date API Reference in the documentation.

The functionality of the package can be summarised as follows: Ontolopy takes OBO files and:

  1. makes them into an intuitive Python object (which subclasses a Python dict, meaning that you can do everything with it that you can do with this familiar and useful data type.

  2. provides a set of tools for doing some useful manipulations and queries to these objects, which are particular to ontologies. This includes for example propagating relationships between terms, finding leaf/root terms, and merging ontologies.

  3. Further to this, it provides an extra class for manipulating and querying the Uberon anatomy specifically.

7.2.1. Structure

Ontolopy is organised into three submodules, each centred around classes with the same names: opy.Obo() for OBO ontology objects, obo.Relations() for finding relationships between terms in an ontology object, and opy.Uberon() for finding tissue mappings. These three submodules are automatically loaded with import ontolopy.

7.2.2. Working with OBO ontologies

The opy.obo module contains the following callables that make it easier to work with OBO ontologies:

load_obo(file_loc[, ont_ids, discard_obsolete])

Loads ontology from .obo file at file_loc.

download_obo(data_name[, out_dir])

Download obo from a list of known locations.

Obo([source_dict])

Creates Obo ontology object from dict with ontology terms for keys, mapping to term attributes and relations.

Obo.merge(new[, prefer])

Recursively merges new into self and returns a merged Obo ontology.

7.2.2.1. The Obo class

The Obo class is an OBO ontology object, which subclasses dict. New Obo objects can be created from nested dictionaries. At the top level of the dictionary, keys are terms and values are dictionaries. This dictionary structure also allows you to add new terms.

Obo reference

class ontolopy.obo.Obo(source_dict={})

Creates Obo ontology object from dict with ontology terms for keys, mapping to term attributes and relations.

Each key/term is a dictionary with key: value pairs mapping either:

  1. Attribute (str) to value (str), e.g. ‘name’: ‘scapula’

  2. Type of relationship (str) to term identifiers (list), e.g. ‘is_a’: [‘UBERON:0002513’]

Info: Obo stands for Open Biological Ontology: a popular file format for building biological ontologies.

__init__(source_dict={})

Initialise self from a source dictionary.

Parameters

source_dictdict mapping terms to their attributes and relationships.

Methods

merge(new[, prefer])

Recursively merges new into self and returns a merged Obo ontology.

Attributes

leaves

Leaf terms are the most specific terms in the ontology; they have no children, only parents (a set object).

terms

The ontology terms (a dict_keys object).

Obo usage example

import ontolopy as opy

new_ontology = opy.Obo({'TERM:000001': {'name': 'Example term'}})
new_ontology['TERM:000002'] = {'name': 'Second example term', 'is_a': ['TERM:000001']}

7.2.2.2. Merging ontologies

It’s also possible to merge (a list of) ontologies into the base ontology. This can be useful for investigating relationships between ontologies. For example, to find relationships between samples and tissues, that might go via cells, you may want to merge a sample ontology, cell, and tissue ontology to find all possible relationships.

Obo.merge reference

Obo.merge(new, prefer='self')

Recursively merges new into self and returns a merged Obo ontology.

Parameters
  • new – Obo object (or list of objects) to add.

  • prefer – prefer ‘self’ (base Obo) or ‘new’ (new Obo)

Return merged

A merged Obo

7.2.2.3. Loading ontologies from file

While creating ontologies from dictionaries is useful for adding bespoke terms, most of the time we want to load an official and curated OBO from a file.

load_obo reference

ontolopy.obo.load_obo(file_loc, ont_ids=None, discard_obsolete=True)

Loads ontology from .obo file at file_loc.

Parameters
  • file_loc – file location - path to stored obo file.

  • ont_ids – list of ontology ids, e.g. [‘UBERON’, ‘CL’]

  • discard_obsolete – if True discard obsolete terms.

Returns

Obo ontology object.

7.2.2.4. Downloading OBO files

It’s also possible to download OBO files, either from a list of popular OBO files by name, or via a URL.

download_obo reference

ontolopy.obo.download_obo(data_name, out_dir='../data/')

Download obo from a list of known locations.

Parameters
  • data_name – Name of OBO you wish to download.

  • out_dir – Directory in which to save OBO file.

Return out_file

path to saved file.

7.2.3. Finding relationships

The most key functionality in Ontolopy is the ability to infer relationships between terms, across ontologies (be it between tissue terms and phenotype terms, or something else). This functionality is inside the opy.relations module and handled by the Relations class.

Relations reference

relation_path_to_text(relation_path, ont)

Converts from a relation string e.g. “UBERON:123913.is_a~UBERON:1381239” to a text version,

Relations(allowed_relations, ont[, sources, …])

7.2.3.1. The Relations class

The Relations class finds relationships of certain types between sources and targets. It subclasses a Pandas DataFrame since that is a convenient and familiar format for the relationship information to be returned.

class ontolopy.relations.Relations(allowed_relations: list, ont, sources=None, targets=None, source_targets=None, excluded=None, col_names=None, mode='any')
__init__(allowed_relations: list, ont, sources=None, targets=None, source_targets=None, excluded=None, col_names=None, mode='any')

Pandas Dataframe containing relationships between sources and targets terms according to ont. Finds relationships that do not pass through excluded terms and uses only allowed_relations. We keep looking until we find a relation to a target (if mode == ‘any’) or we run out of leads.

Parameters
  • allowed_relations – a list of allowed relations, e.g. [‘is_a’, ‘part_of’]

  • sources – list of sources. For mode all must be a list of source-target tuple airs.

  • mode – ‘any’ or ‘all’ - ‘all is looking for specific term1-term2 pairs, while ‘any’ is looking for any relationship between something in specific source and anything in targets.

  • targets – list of targets.

  • source_targets – list of tuples of source-target pairs. Do not provide source or targets if using this parameter. Only runs in “all” mode.

  • ont – Obo ontology object.

  • excluded – a list/set of terms which are explicitly not being searched for (which may otherwise match the targets). Useful e.g. if we want to look for any tissue targets with prefix ‘UBERON’, except for very general ones. Does not allow relationships that pass through this term.

  • col_names – Alternative column names for the output of Relations Data Frame, by default is [‘from’, ‘relation_path’, ‘relation_text’, ‘to’]

To find relationships, the code loops through sources, and for each source it will look at the allowed_relations to find relationships with other terms, then for each of these terms it will look for relationships with other terms in the same manner, etc.

Internally, Ontolopy stores these relationships as a list of strings, where each string details the relations between the source term and other terms, e.g. UBERON:123913.is_a~UBERON:1381239.is_a~UBERON:987890. Let’s call these strings relation paths.

Cyclic relationships are not permitted (a term can only be present in a relation path once). Relationships continue to be searched for until either the ontology provided can no longer add any new relation paths OR we found what we were looking for.

In “any” mode, finding what we’re looking for means finding any target term as the last term in the relation string, while in “all” mode, we must find all target terms for the source term.

The mode parameter can be either any or all, and this represents whether we are looking for relations from our source terms to any one target term, or to all target terms for which we can find a relationship. It is much quicker to run in “any” mode, so this mode is the default, and it is preferable when we simply need the most direct mapping between our source and target terms, for example we want to know which (one) tissue does the sample map to best?

The “all” mode tends to be more useful when we are equally interested in the targets as the source terms for example: when looking at mappings between tissues and phenotypes, there is likely to be many different phenotypes that a tissue can exhibit and we are equally interested in all of them.

Provide either sources and targets OR source-targets. It’s possible to provide a list of sources and a list of targets, OR a list of tuple source-targets. It does not make sense to provide both. The latter option only works in all mode: i.e. we are interested in all source-target pairs. Essentially, the sources-targets option provides a quicker way of running Ontolopy in “all” mode when we know in advance which specific pairs of sources and targets we are interested in. If sources and targets are provided and mode==all, then Ontolopy will generate a combination of all possible sources and targets (removing excluded target terms if provided).

7.2.3.2. Converting “relation paths” to text

Since relationships are internally stored as relation paths as explained above, it is useful to turn these strings into more readable text, which is what the relation_path_to_text function does.

relation_path_to_text reference

ontolopy.relations.relation_path_to_text(relation_path, ont)
Converts from a relation string e.g. “UBERON:123913.is_a~UBERON:1381239” to a text version,

e.g. “heart is a circulatory organ”.

Parameters
  • ont – opy.Obo() ontology object.

  • relation_path – path describing the relationship between two terms, e.g. “UBERON:123913.is_a-UBERON:1381239”

Returns

7.2.4. Creating Uberon Mappings

The opy.uberon submodule contains the specific tools for working with the Uberon ontology: finding mappings between tissues and phenotypes via ontology terms by making use of the Relations class, as well as doing this mapping using text, and comparing these two mappings. The vast majority of this functionality sits in the Uberon class.

uberon_from_obo(obo)

Creates an Uberon object from an Obo object.

Uberon()

An UBERON-specific ontology object.

7.2.4.1. The Uberon class

Calling the Uberon class itself simply checks if there are any Uberon terms in the merged ontology, and then allows the ontology to be used to create Uberon sample-to-tissue mappings, through class methods (which should be called separately).

There are three parts to the process in creating Uberon mappings, the functionality for which lives in three different Uberon class methods:

  1. Mapping via name: Map from sample-to-tissue via informal tissue names given in experimental design information (e.g. “eye stalk”) to an Uberon term (UBERON:0010326, Optic Pedicel).

  2. Mapping via ontology term: Map from CL cell types (e.g. CL:0000235, Macrophage), sample ontology term to Uberon tissues (e.g. UBERON:0002405, Immune system). Or from sample ontology terms (like FANTOM terms, such as FF:10048-101G3, Smooth Muscle, Adult, Pool1) to Uberon terms (UBERON:0001135, Smooth Muscle Tissue). Returns relationships between source term and Uberon term.

  3. Create sample-to-tissue mappings and disagreements between mappings based on (1) and (2).

class ontolopy.uberon.Uberon

An UBERON-specific ontology object.

__init__()

Initialise self from a source dictionary.

Parameters

source_dictdict mapping terms to their attributes and relationships.

Methods

__init__()

Initialise self from a source dictionary.

sample_map_by_ont(sample_ids[, exclude, …])

Map tissues from sample names to uberon identifiers.

sample_map_by_name(sample_names[, to, …])

Map tissues from sample identifiers to uberon identifers.

get_overall_tissue_mappings(map_by_name, …)

Combines the two mappings map_by_name and map_by_ont to create an overall mapping and disagreements.

7.2.4.2. Mapping from sample to tissue via name using Uberon.sample_map_by_name

Informal tissue names are mapped from Uberon term identifiers by checking for exact name matches to Uberon term names and their synonyms in the extended Uberon ontology.

If an exact match does not exist, individual words from the phenotype term name or synonyms are then searched for exactly. First stop words are removed, using the base list in the Natural Language Toolkit (nltk) Python Package[227] (e.g. and, or), and a small number of manually curated phenotypic stopwords (e.g. “phenotype”, “abnormality”). This would mean that the HP term “abnormality of the head and neck” would search for the words “head” and “neck” in the UBERON terms, and would be mapped the terms of the same name (but never to “neck of radius” - which is related to bone). In cases where multiple terms are found, a common parent would be searched for, in this case the result is “craniocervical region” .

Uberon.sample_map_by_name reference

Uberon.sample_map_by_name(sample_names, to=None, col_names=None, xref=None, synonym_types=None)

Map tissues from sample identifiers to uberon identifers.

Parameters
  • sample_names – map from sample identifiers to tissue/sample descriptors/names for values. May be dict or pd.Series

  • to – list of ontology prefixes that you want to map to.

  • xref – An ontology identifier (e.g. FMA) the presence of which denotes a preferred term.

  • col_names – Column names of returned relationships

Returns

7.2.4.3. Mapping from sample to tissue via ontology term using Uberon.sample_map_by_ont

The sample_map_by_ont function uses the Relations class in “any” mode to find relationships via ontologies in much the same way described above. This is essentially a wrapper that provides convenient default settings for allowed relations and targets.

Mappings can be made via any term in the merged ontology, which allows mappings that cannot be made through Uberon alone, for example: Macrophage - monocyte derived, donor3 is_a Human macrophage sample derives_from Macrophage is_a Monocyte is_a Leukocyte part_of Immune System, which means this sample is derived from part of the immune system.

Uberon.sample_map_by_ont reference

Uberon.sample_map_by_ont(sample_ids: list, exclude=None, relation_types=None, to=None, child_mapping=False)

Map tissues from sample names to uberon identifiers. Will only work if ontology contains Uberon + Sample terms.

Parameters
  • sample_ids – list of sample identifiers

  • exclude – list of tissues to exclude, i.e. because they are too general.

  • relation_types – list of relation types in ontology that relate to position in body.

  • to – list of ontology prefixes that you want to map to.

  • child_mapping – If True, searches children instead of parents.

Returns

Mapping by term: child mapping Some samples may be pools of cell types that may come from more than one anatomical location. In this case, there will be no regular mapping, since no parent terms will have a mapping to a tissue. In this case, we can look at tissue mappings (in the usual way, described above), for all of the children of our parent term of interest. I call this mode “child mapping” and it is off by default.

So, for example melanocytes are are melanin-producing cells found in many different places in the body (skin, hair, heart), and therefore they don’t (nor any of their parents) map to a specific Uberon term. If we choose child_mapping==TRUE, then for this term, we will get a list of all Uberon terms that cells of this type can come from. This mode isn’t currently used in the context of the rest of this thesis.

7.2.4.4. Getting overall mappings and finding disagreements using Uberon.get_overall_tissue_mappings

As described, Ontolopy has two methods of mapping to tissues, and it also provides a method of harmonising these two mappings, and for finding any disagreements between them. This can be very useful for revealing logical inconsistencies in either the mappings or the ontologies (as was the case in the FANTOM5 example).

Uberon.get_overall_tissue_mappings reference

Uberon.get_overall_tissue_mappings(map_by_name, map_by_ont, rel=None)

Combines the two mappings map_by_name and map_by_ont to create an overall mapping and disagreements.

Parameters
  • map_by_name (class: pd.DataFrame) – mapping from sample to tissue via sample name, from Uberon.sample_map_by_name.

  • map_by_ont (class: pd.DataFrame) – mapping from sample to tissue via sample ontology ID, from Uberon.sample_map_by_ont.

Parm rel

list of relation strings allowed between name and ontology mappings to count as not a disagreement.

Returns

(overall_mapping: mapping from sample to tissue combining both sources, disagreements: disagreements between “by name” and “by ontology” mappings.)

Return type

(class: pd.DataFrame, class: pd.DataFrame)