Functionality
Contents
7.2. Functionality¶
This section describes the low-level functionality of Ontolopy: what it can do. For examples of how this functionality is of practical use, please see the examples sections. You can find a full and up-to-date API Reference in the documentation.
The functionality of the package can be summarised as follows: Ontolopy takes OBO files and:
makes them into an intuitive Python object (which subclasses a Python
dict
, meaning that you can do everything with it that you can do with this familiar and useful data type.provides a set of tools for doing some useful manipulations and queries to these objects, which are particular to ontologies. This includes for example propagating relationships between terms, finding leaf/root terms, and merging ontologies.
Further to this, it provides an extra class for manipulating and querying the Uberon anatomy specifically.
7.2.1. Structure¶
Ontolopy is organised into three submodules, each centred around classes with the same names: opy.Obo()
for OBO ontology objects, obo.Relations()
for finding relationships between terms in an ontology object, and opy.Uberon()
for finding tissue mappings.
These three submodules are automatically loaded with import ontolopy
.
7.2.2. Working with OBO ontologies¶
The opy.obo
module contains the following callables that make it easier to work with OBO ontologies:
|
Loads ontology from .obo file at file_loc. |
|
Download obo from a list of known locations. |
|
Creates Obo ontology object from dict with ontology terms for keys, mapping to term attributes and relations. |
|
Recursively merges new into self and returns a merged Obo ontology. |
7.2.2.1. The Obo
class¶
The Obo
class is an OBO ontology object, which subclasses dict
.
New Obo
objects can be created from nested dictionaries.
At the top level of the dictionary, keys are terms and values are dictionaries.
This dictionary structure also allows you to add new terms.
Obo reference
-
class
ontolopy.obo.
Obo
(source_dict={})¶ Creates Obo ontology object from dict with ontology terms for keys, mapping to term attributes and relations.
Each key/term is a dictionary with key: value pairs mapping either:
Attribute (str) to value (str), e.g. ‘name’: ‘scapula’
Type of relationship (str) to term identifiers (list), e.g. ‘is_a’: [‘UBERON:0002513’]
Info: Obo stands for Open Biological Ontology: a popular file format for building biological ontologies.
-
__init__
(source_dict={})¶ Initialise self from a source dictionary.
- Parameters
source_dict – dict mapping terms to their attributes and relationships.
Methods
merge
(new[, prefer])Recursively merges new into self and returns a merged Obo ontology.
Attributes
leaves
Leaf terms are the most specific terms in the ontology; they have no children, only parents (a set object).
terms
The ontology terms (a dict_keys object).
Obo usage example
import ontolopy as opy
new_ontology = opy.Obo({'TERM:000001': {'name': 'Example term'}})
new_ontology['TERM:000002'] = {'name': 'Second example term', 'is_a': ['TERM:000001']}
7.2.2.2. Merging ontologies¶
It’s also possible to merge (a list of) ontologies into the base ontology. This can be useful for investigating relationships between ontologies. For example, to find relationships between samples and tissues, that might go via cells, you may want to merge a sample ontology, cell, and tissue ontology to find all possible relationships.
Obo.merge reference
-
Obo.
merge
(new, prefer='self')¶ Recursively merges new into self and returns a merged Obo ontology.
- Parameters
new – Obo object (or list of objects) to add.
prefer – prefer ‘self’ (base Obo) or ‘new’ (new Obo)
- Return merged
A merged Obo
7.2.2.3. Loading ontologies from file¶
While creating ontologies from dictionaries is useful for adding bespoke terms, most of the time we want to load an official and curated OBO from a file.
load_obo reference
-
ontolopy.obo.
load_obo
(file_loc, ont_ids=None, discard_obsolete=True)¶ Loads ontology from .obo file at file_loc.
- Parameters
file_loc – file location - path to stored obo file.
ont_ids – list of ontology ids, e.g. [‘UBERON’, ‘CL’]
discard_obsolete – if True discard obsolete terms.
- Returns
Obo ontology object.
7.2.2.4. Downloading OBO files¶
It’s also possible to download OBO files, either from a list of popular OBO files by name, or via a URL.
download_obo reference
-
ontolopy.obo.
download_obo
(data_name, out_dir='../data/')¶ Download obo from a list of known locations.
- Parameters
data_name – Name of OBO you wish to download.
out_dir – Directory in which to save OBO file.
- Return out_file
path to saved file.
7.2.3. Finding relationships¶
The most key functionality in Ontolopy is the ability to infer relationships between terms, across ontologies (be it between tissue terms and phenotype terms, or something else).
This functionality is inside the opy.relations
module and handled by the Relations class.
Relations reference
|
Converts from a relation string e.g. “UBERON:123913.is_a~UBERON:1381239” to a text version, |
|
7.2.3.1. The Relations
class¶
The Relations
class finds relationships of certain types between sources and targets.
It subclasses a Pandas DataFrame since that is a convenient and familiar format for the relationship information to be returned.
-
class
ontolopy.relations.
Relations
(allowed_relations: list, ont, sources=None, targets=None, source_targets=None, excluded=None, col_names=None, mode='any')¶ -
__init__
(allowed_relations: list, ont, sources=None, targets=None, source_targets=None, excluded=None, col_names=None, mode='any')¶ Pandas Dataframe containing relationships between sources and targets terms according to ont. Finds relationships that do not pass through excluded terms and uses only allowed_relations. We keep looking until we find a relation to a target (if mode == ‘any’) or we run out of leads.
- Parameters
allowed_relations – a list of allowed relations, e.g. [‘is_a’, ‘part_of’]
sources – list of sources. For mode all must be a list of source-target tuple airs.
mode – ‘any’ or ‘all’ - ‘all is looking for specific term1-term2 pairs, while ‘any’ is looking for any relationship between something in specific source and anything in targets.
targets – list of targets.
source_targets – list of tuples of source-target pairs. Do not provide source or targets if using this parameter. Only runs in “all” mode.
ont – Obo ontology object.
excluded – a list/set of terms which are explicitly not being searched for (which may otherwise match the targets). Useful e.g. if we want to look for any tissue targets with prefix ‘UBERON’, except for very general ones. Does not allow relationships that pass through this term.
col_names – Alternative column names for the output of Relations Data Frame, by default is [‘from’, ‘relation_path’, ‘relation_text’, ‘to’]
-
To find relationships, the code loops through sources, and for each source it will look at the allowed_relations
to find relationships with other terms, then for each of these terms it will look for relationships with other terms in the same manner, etc.
Internally, Ontolopy stores these relationships as a list of strings, where each string details the relations between the source term and other terms, e.g. UBERON:123913.is_a~UBERON:1381239.is_a~UBERON:987890
.
Let’s call these strings relation paths.
Cyclic relationships are not permitted (a term can only be present in a relation path once). Relationships continue to be searched for until either the ontology provided can no longer add any new relation paths OR we found what we were looking for.
In “any” mode, finding what we’re looking for means finding any target term as the last term in the relation string, while in “all” mode, we must find all target terms for the source term.
The mode
parameter can be either any
or all
, and this represents whether we are looking for relations from our source terms to any one target term, or to all target terms for which we can find a relationship.
It is much quicker to run in “any” mode, so this mode is the default, and it is preferable when we simply need the most direct mapping between our source and target terms, for example we want to know which (one) tissue does the sample map to best?
The “all” mode tends to be more useful when we are equally interested in the targets as the source terms for example: when looking at mappings between tissues and phenotypes, there is likely to be many different phenotypes that a tissue can exhibit and we are equally interested in all of them.
Provide either sources
and targets
OR source-targets
.
It’s possible to provide a list of sources
and a list of targets
, OR a list of tuple source-targets
.
It does not make sense to provide both.
The latter option only works in all
mode: i.e. we are interested in all source-target pairs.
Essentially, the sources-targets
option provides a quicker way of running Ontolopy
in “all” mode when we know in advance which specific pairs of sources and targets we are interested in.
If sources
and targets
are provided and mode==all
, then Ontolopy
will generate a combination of all possible sources and targets (removing excluded
target terms if provided).
7.2.3.2. Converting “relation paths” to text¶
Since relationships are internally stored as relation paths as explained above, it is useful to turn these strings into more readable text, which is what the relation_path_to_text
function does.
relation_path_to_text reference
-
ontolopy.relations.
relation_path_to_text
(relation_path, ont)¶ - Converts from a relation string e.g. “UBERON:123913.is_a~UBERON:1381239” to a text version,
e.g. “heart is a circulatory organ”.
- Parameters
ont – opy.Obo() ontology object.
relation_path – path describing the relationship between two terms, e.g. “UBERON:123913.is_a-UBERON:1381239”
- Returns
7.2.4. Creating Uberon Mappings¶
The opy.uberon
submodule contains the specific tools for working with the Uberon ontology: finding mappings between tissues and phenotypes via ontology terms by making use of the Relations class, as well as doing this mapping using text, and comparing these two mappings.
The vast majority of this functionality sits in the Uberon
class.
|
Creates an Uberon object from an Obo object. |
|
An UBERON-specific ontology object. |
7.2.4.1. The Uberon
class¶
Calling the Uberon
class itself simply checks if there are any Uberon
terms in the merged ontology, and then allows the ontology to be used to create Uberon sample-to-tissue mappings, through class methods (which should be called separately).
There are three parts to the process in creating Uberon mappings, the functionality for which lives in three different Uberon
class methods:
Mapping via name: Map from sample-to-tissue via informal tissue names given in experimental design information (e.g. “eye stalk”) to an Uberon term (
UBERON:0010326
, Optic Pedicel).Mapping via ontology term: Map from CL cell types (e.g.
CL:0000235
, Macrophage), sample ontology term to Uberon tissues (e.g.UBERON:0002405
, Immune system). Or from sample ontology terms (like FANTOM terms, such asFF:10048-101G3
, Smooth Muscle, Adult, Pool1) to Uberon terms (UBERON:0001135
, Smooth Muscle Tissue). Returns relationships between source term and Uberon term.Create sample-to-tissue mappings and disagreements between mappings based on (1) and (2).
-
class
ontolopy.uberon.
Uberon
¶ An UBERON-specific ontology object.
-
__init__
()¶ Initialise self from a source dictionary.
- Parameters
source_dict – dict mapping terms to their attributes and relationships.
Methods
__init__
()Initialise self from a source dictionary.
sample_map_by_ont
(sample_ids[, exclude, …])Map tissues from sample names to uberon identifiers.
sample_map_by_name
(sample_names[, to, …])Map tissues from sample identifiers to uberon identifers.
get_overall_tissue_mappings
(map_by_name, …)Combines the two mappings map_by_name and map_by_ont to create an overall mapping and disagreements.
-
7.2.4.2. Mapping from sample to tissue via name using Uberon.sample_map_by_name
¶
Informal tissue names are mapped from Uberon term identifiers by checking for exact name matches to Uberon term names and their synonyms in the extended Uberon ontology.
If an exact match does not exist, individual words from the phenotype term name or synonyms are then searched for exactly.
First stop words are removed, using the base list in the Natural Language Toolkit (nltk
) Python Package[227] (e.g. and, or), and a small number of manually curated phenotypic stopwords (e.g. “phenotype”, “abnormality”).
This would mean that the HP term “abnormality of the head and neck” would search for the words “head” and “neck” in the UBERON terms, and would be mapped the terms of the same name (but never to “neck of radius” - which is related to bone).
In cases where multiple terms are found, a common parent would be searched for, in this case the result is “craniocervical region” .
Uberon.sample_map_by_name reference
-
Uberon.
sample_map_by_name
(sample_names, to=None, col_names=None, xref=None, synonym_types=None)¶ Map tissues from sample identifiers to uberon identifers.
- Parameters
sample_names – map from sample identifiers to tissue/sample descriptors/names for values. May be dict or pd.Series
to – list of ontology prefixes that you want to map to.
xref – An ontology identifier (e.g. FMA) the presence of which denotes a preferred term.
col_names – Column names of returned relationships
- Returns
7.2.4.3. Mapping from sample to tissue via ontology term using Uberon.sample_map_by_ont
¶
The sample_map_by_ont
function uses the Relations class in “any” mode to find relationships via ontologies in much the same way described above.
This is essentially a wrapper that provides convenient default settings for allowed relations and targets.
Mappings can be made via any term in the merged ontology, which allows mappings that cannot be made through Uberon alone, for example: Macrophage - monocyte derived, donor3 is_a
Human macrophage sample derives_from
Macrophage is_a
Monocyte is_a
Leukocyte part_of
Immune System, which means this sample is derived from part of the immune system.
Uberon.sample_map_by_ont reference
-
Uberon.
sample_map_by_ont
(sample_ids: list, exclude=None, relation_types=None, to=None, child_mapping=False)¶ Map tissues from sample names to uberon identifiers. Will only work if ontology contains Uberon + Sample terms.
- Parameters
sample_ids – list of sample identifiers
exclude – list of tissues to exclude, i.e. because they are too general.
relation_types – list of relation types in ontology that relate to position in body.
to – list of ontology prefixes that you want to map to.
child_mapping – If True, searches children instead of parents.
- Returns
Mapping by term: child mapping Some samples may be pools of cell types that may come from more than one anatomical location. In this case, there will be no regular mapping, since no parent terms will have a mapping to a tissue. In this case, we can look at tissue mappings (in the usual way, described above), for all of the children of our parent term of interest. I call this mode “child mapping” and it is off by default.
So, for example melanocytes are are melanin-producing cells found in many different places in the body (skin, hair, heart), and therefore they don’t (nor any of their parents) map to a specific Uberon term.
If we choose child_mapping==TRUE
, then for this term, we will get a list of all Uberon terms that cells of this type can come from.
This mode isn’t currently used in the context of the rest of this thesis.
7.2.4.4. Getting overall mappings and finding disagreements using Uberon.get_overall_tissue_mappings
¶
As described, Ontolopy has two methods of mapping to tissues, and it also provides a method of harmonising these two mappings, and for finding any disagreements between them. This can be very useful for revealing logical inconsistencies in either the mappings or the ontologies (as was the case in the FANTOM5 example).
Uberon.get_overall_tissue_mappings reference
-
Uberon.
get_overall_tissue_mappings
(map_by_name, map_by_ont, rel=None)¶ Combines the two mappings map_by_name and map_by_ont to create an overall mapping and disagreements.
- Parameters
map_by_name (class: pd.DataFrame) – mapping from sample to tissue via sample name, from Uberon.sample_map_by_name.
map_by_ont (class: pd.DataFrame) – mapping from sample to tissue via sample ontology ID, from Uberon.sample_map_by_ont.
- Parm rel
list of relation strings allowed between name and ontology mappings to count as not a disagreement.
- Returns
(overall_mapping: mapping from sample to tissue combining both sources, disagreements: disagreements between “by name” and “by ontology” mappings.)
- Return type
(class: pd.DataFrame, class: pd.DataFrame)