Protein

lamindb provides access to the following public Protein ontologies through bionty:

  1. Uniprot

Here we show how to access and search Protein ontologies to standardize new data.

import bionty as bt
import pandas as pd

PublicOntology objects

Let us create a public ontology accessor with .public method, which chooses a default public ontology source from Source. It’s a PublicOntology object, which you can think about as a public registry:

proteins = bt.Protein.public(organism="human")
proteins
 connected lamindb: testuser1/test-public-ontologies

PublicOntology
Entity: Protein
Organism: human
Source: uniprot, 2024-03
#terms: 204088

As for registries, you can export the ontology as a DataFrame:

df = proteins.df()
df.head()
uniprotkb_id name description length synonyms gene_symbol ensembl_gene_ids
0 A0A023HJ61 Ras-related protein Rab-4A 121 RAB4A None
1 A0A023HN28 SRSF3 USP6 fusion protein 16 None None
2 A0A023I7F4 Cytochrome b 380 CYTB None
3 A0A023I7H2 NADH-ubiquinone oxidoreductase chain 5 603 EC 7.1.1.2 ND5 None
4 A0A023I7H5 ATP synthase subunit a 226 ATP6 None

Unlike registries, you can also export it as a Pronto object via public.ontology.

Look up terms

As for registries, terms can be looked up with auto-complete:

lookup = proteins.lookup()

The . accessor provides normalized terms (lower case, only contains alphanumeric characters and underscores):

lookup.ac3
Protein(uniprotkb_id='Q8IX81', name='AC3', description='', length=502, synonyms='', gene_symbol=None, ensembl_gene_ids=None)

To look up the exact original strings, convert the lookup object to dict and use the [] accessor:

lookup_dict = lookup.dict()
lookup_dict["AC3"]
Protein(uniprotkb_id='Q8IX81', name='AC3', description='', length=502, synonyms='', gene_symbol=None, ensembl_gene_ids=None)

By default, the name field is used to generate lookup keys. You can specify another field to look up:

lookup = proteins.lookup(proteins.gene_symbol)
lookup.rab4a
[Protein(uniprotkb_id='A0A023HJ61', name='Ras-related protein Rab-4A', description='', length=121, synonyms='', gene_symbol='RAB4A', ensembl_gene_ids=None),
 Protein(uniprotkb_id='A0A087WYT5', name='RAB4A', description='member RAS oncogene family', length=113, synonyms='', gene_symbol='RAB4A', ensembl_gene_ids='ENST00000618010.4;'),
 Protein(uniprotkb_id='P20338', name='Ras-related protein Rab-4A', description='', length=218, synonyms='EC 3.6.5.2', gene_symbol='RAB4A', ensembl_gene_ids='ENST00000366690.5;')]

Search terms

Search behaves in the same way as it does for registries:

proteins.search("RAS").head(3)
uniprotkb_id name description length synonyms gene_symbol ensembl_gene_ids
189295 Q96PV0 Ras Rap GTPase-activating protein SynGAP 1343 Neuronal RasGAP|Synaptic Ras GTPase-activating... SYNGAP1 ENST00000395071.6 [Q96PV0-4];ENST00000414753.6...
16769 A0A140T8W4 Ras Rap GTPase-activating protein SynGAP 487 SYNGAP1 ENST00000355818.3;
158749 P20936 Ras GTPase-activating protein 1 1047 GAP|GTPase-activating protein|RasGAP|Ras p21 p... RASA1 ENST00000274376.11 [P20936-1];ENST00000456692....

By default, search also covers synonyms and all other fileds containing strings:

proteins.search("member of RAS oncogene family like 2B").head(3)
uniprotkb_id name description length synonyms gene_symbol ensembl_gene_ids
71092 A0A8I5KRY9 RAB member of RAS oncogene family like 2B 58 RABL2B ENST00000685352.1;
71734 A0A8I5KX29 RAB member of RAS oncogene family like 2B 51 RABL2B ENST00000690024.1;
81780 A8MXF6 RAB member of RAS oncogene family like 2B 165 RABL2B ENST00000395591.5;

Search specific field (by default, search is done on all fields containing strings):

proteins.search(
    "RABL2B",
    field=proteins.gene_symbol,
).head()
uniprotkb_id name description length synonyms gene_symbol ensembl_gene_ids
71092 A0A8I5KRY9 RAB member of RAS oncogene family like 2B 58 RABL2B ENST00000685352.1;
71734 A0A8I5KX29 RAB member of RAS oncogene family like 2B 51 RABL2B ENST00000690024.1;
81780 A8MXF6 RAB member of RAS oncogene family like 2B 165 RABL2B ENST00000395591.5;
101750 C9JFZ0 RAB member of RAS oncogene family like 2B 20 RABL2B ENST00000413505.1;
119224 F2Z2T3 RAB member of RAS oncogene family like 2B 99 RABL2B ENST00000395590.5;

Standardize Protein identifiers

Let us generate a DataFrame that stores a number of Protein identifiers, some of which corrupted:

df_orig = pd.DataFrame(
    index=[
        "A0A024QZ08",
        "X6RLV5",
        "X6RM24",
        "A0A024QZQ1",
        "This protein does not exist",
    ]
)
df_orig
A0A024QZ08
X6RLV5
X6RM24
A0A024QZQ1
This protein does not exist

We can check whether any of our values are validated against the ontology reference:

validated = proteins.validate(df_orig.index, proteins.name)
df_orig.index[~validated]
! 5 unique terms (100.00%) are not validated: 'A0A024QZ08', 'X6RLV5', 'X6RM24', 'A0A024QZQ1', 'This protein does not exist'
Index(['A0A024QZ08', 'X6RLV5', 'X6RM24', 'A0A024QZQ1',
       'This protein does not exist'],
      dtype='object')

Ontology source versions

For any given entity, we can choose from a number of versions:

bt.Source.filter(entity="bionty.Protein").df()
Hide code cell output
uid entity organism name in_db currently_used description url md5 source_website space_id dataframe_artifact_id version run_id created_at created_by_id _aux _branch_code
id
22 3EYyGRYN bionty.Protein human uniprot False True Uniprot s3://bionty-assets/df_human__uniprot__2024-03_... None https://www.uniprot.org 1 None 2024-03 None 2025-03-10 13:25:14.948000+00:00 1 None 1
23 1hTf3Lz8 bionty.Protein human uniprot False False Uniprot s3://bionty-assets/df_human__uniprot__2023-03_... None https://www.uniprot.org 1 None 2023-03 None 2025-03-10 13:25:14.948000+00:00 1 None 1
24 qxHZM6Mt bionty.Protein human uniprot False False Uniprot s3://bionty-assets/human_uniprot_2023-02_Prote... None https://www.uniprot.org 1 None 2023-02 None 2025-03-10 13:25:14.948000+00:00 1 None 1
25 01RWXN2V bionty.Protein mouse uniprot False True Uniprot s3://bionty-assets/df_mouse__uniprot__2024-03_... None https://www.uniprot.org 1 None 2024-03 None 2025-03-10 13:25:14.948000+00:00 1 None 1
26 7VsGP1mM bionty.Protein mouse uniprot False False Uniprot s3://bionty-assets/df_mouse__uniprot__2023-03_... None https://www.uniprot.org 1 None 2023-03 None 2025-03-10 13:25:14.948000+00:00 1 None 1
27 3HPinROJ bionty.Protein mouse uniprot False False Uniprot s3://bionty-assets/mouse_uniprot_2023-02_Prote... None https://www.uniprot.org 1 None 2023-02 None 2025-03-10 13:25:14.948000+00:00 1 None 1
# only lists the sources that are currently used
bt.Source.filter(entity="bionty.Protein", currently_used=True).df()
uid entity organism name in_db currently_used description url md5 source_website space_id dataframe_artifact_id version run_id created_at created_by_id _aux _branch_code
id
22 3EYyGRYN bionty.Protein human uniprot False True Uniprot s3://bionty-assets/df_human__uniprot__2024-03_... None https://www.uniprot.org 1 None 2024-03 None 2025-03-10 13:25:14.948000+00:00 1 None 1
25 01RWXN2V bionty.Protein mouse uniprot False True Uniprot s3://bionty-assets/df_mouse__uniprot__2024-03_... None https://www.uniprot.org 1 None 2024-03 None 2025-03-10 13:25:14.948000+00:00 1 None 1

When instantiating a Bionty object, we can choose a source or version:

source = bt.Source.filter(
    name="uniprot", version="2023-03", organism="human"
).one()
proteins= bt.Protein.public(source=source)
proteins

PublicOntology
Entity: Protein
Organism: human
Source: uniprot, 2023-03
#terms: 207892

The currently used ontologies can be displayed using:

bt.Source.filter(currently_used=True).df()
Hide code cell output
uid entity organism name in_db currently_used description url md5 source_website space_id dataframe_artifact_id version run_id created_at created_by_id _aux _branch_code
id
1 33TUF039 bionty.Organism vertebrates ensembl False True Ensembl https://ftp.ensembl.org/pub/release-112/specie... None https://www.ensembl.org 1 None release-112 None 2025-03-10 13:25:14.948000+00:00 1 None 1
6 6bbVUTCS bionty.Organism bacteria ensembl False True Ensembl https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacte... None https://www.ensembl.org 1 None release-57 None 2025-03-10 13:25:14.948000+00:00 1 None 1
7 6s9nV6xh bionty.Organism fungi ensembl False True Ensembl http://ftp.ensemblgenomes.org/pub/fungi/releas... None https://www.ensembl.org 1 None release-57 None 2025-03-10 13:25:14.948000+00:00 1 None 1
8 2PmTrc8x bionty.Organism metazoa ensembl False True Ensembl http://ftp.ensemblgenomes.org/pub/metazoa/rele... None https://www.ensembl.org 1 None release-57 None 2025-03-10 13:25:14.948000+00:00 1 None 1
9 7GPHh16S bionty.Organism plants ensembl False True Ensembl https://ftp.ensemblgenomes.ebi.ac.uk/pub/plant... None https://www.ensembl.org 1 None release-57 None 2025-03-10 13:25:14.948000+00:00 1 None 1
10 4tsksCMX bionty.Organism all ncbitaxon False True NCBItaxon Ontology s3://bionty-assets/df_all__ncbitaxon__2023-06-... None https://github.com/obophenotype/ncbitaxon 1 None 2023-06-20 None 2025-03-10 13:25:14.948000+00:00 1 None 1
11 4UGNz3fr bionty.Gene human ensembl False True Ensembl s3://bionty-assets/df_human__ensembl__release-... None https://www.ensembl.org 1 None release-112 None 2025-03-10 13:25:14.948000+00:00 1 None 1
15 4r4fvV0S bionty.Gene mouse ensembl False True Ensembl s3://bionty-assets/df_mouse__ensembl__release-... None https://www.ensembl.org 1 None release-112 None 2025-03-10 13:25:14.948000+00:00 1 None 1
19 4RPA3Re0 bionty.Gene saccharomyces cerevisiae ensembl False True Ensembl s3://bionty-assets/df_saccharomyces cerevisiae... None https://www.ensembl.org 1 None release-112 None 2025-03-10 13:25:14.948000+00:00 1 None 1
22 3EYyGRYN bionty.Protein human uniprot False True Uniprot s3://bionty-assets/df_human__uniprot__2024-03_... None https://www.uniprot.org 1 None 2024-03 None 2025-03-10 13:25:14.948000+00:00 1 None 1
25 01RWXN2V bionty.Protein mouse uniprot False True Uniprot s3://bionty-assets/df_mouse__uniprot__2024-03_... None https://www.uniprot.org 1 None 2024-03 None 2025-03-10 13:25:14.948000+00:00 1 None 1
28 3kDh8qAX bionty.CellMarker human cellmarker False True CellMarker s3://bionty-assets/human_cellmarker_2.0_CellMa... None http://bio-bigdata.hrbmu.edu.cn/CellMarker 1 None 2.0 None 2025-03-10 13:25:14.948000+00:00 1 None 1
29 7bV5uJo3 bionty.CellMarker mouse cellmarker False True CellMarker s3://bionty-assets/mouse_cellmarker_2.0_CellMa... None http://bio-bigdata.hrbmu.edu.cn/CellMarker 1 None 2.0 None 2025-03-10 13:25:14.948000+00:00 1 None 1
30 6LyRtvz8 bionty.CellLine all clo False True Cell Line Ontology https://data.bioontology.org/ontologies/CLO/su... None https://bioportal.bioontology.org/ontologies/CLO 1 None 2022-03-21 None 2025-03-10 13:25:14.948000+00:00 1 None 1
32 3Uw2Va7a bionty.CellType all cl False True Cell Ontology http://purl.obolibrary.org/obo/cl/releases/202... None https://obophenotype.github.io/cell-ontology 1 None 2024-08-16 None 2025-03-10 13:25:14.948000+00:00 1 None 1
41 MUtAGdL4 bionty.Tissue all uberon False True Uberon multi-species anatomy ontology http://purl.obolibrary.org/obo/uberon/releases... None http://obophenotype.github.io/uberon 1 None 2024-08-07 None 2025-03-10 13:25:14.948000+00:00 1 None 1
50 4a3ejKuf bionty.Disease all mondo False True Mondo Disease Ontology http://purl.obolibrary.org/obo/mondo/releases/... None https://mondo.monarchinitiative.org 1 None 2024-08-06 None 2025-03-10 13:25:14.948000+00:00 1 None 1
59 4kswnHVF bionty.Disease human doid False True Human Disease Ontology http://purl.obolibrary.org/obo/doid/releases/2... None https://disease-ontology.org 1 None 2024-05-29 None 2025-03-10 13:25:14.952000+00:00 1 None 1
67 2a1HvjdB bionty.ExperimentalFactor all efo False True The Experimental Factor Ontology http://www.ebi.ac.uk/efo/releases/v3.70.0/efo.owl None https://bioportal.bioontology.org/ontologies/EFO 1 None 3.70.0 None 2025-03-10 13:25:14.952000+00:00 1 None 1
75 48fBFLmn bionty.Phenotype human hp False True Human Phenotype Ontology https://github.com/obophenotype/human-phenotyp... None https://hpo.jax.org 1 None 2024-04-26 None 2025-03-10 13:25:14.952000+00:00 1 None 1
80 4t7QibxO bionty.Phenotype mammalian mp False True Mammalian Phenotype Ontology https://github.com/mgijax/mammalian-phenotype-... None https://github.com/mgijax/mammalian-phenotype-... 1 None 2024-06-18 None 2025-03-10 13:25:14.952000+00:00 1 None 1
83 sqPX2b7b bionty.Phenotype zebrafish zp False True Zebrafish Phenotype Ontology https://github.com/obophenotype/zebrafish-phen... None https://github.com/obophenotype/zebrafish-phen... 1 None 2024-04-18 None 2025-03-10 13:25:14.952000+00:00 1 None 1
87 6S4qkDx1 bionty.Phenotype all pato False True Phenotype And Trait Ontology http://purl.obolibrary.org/obo/pato/releases/2... None https://github.com/pato-ontology/pato 1 None 2024-03-28 None 2025-03-10 13:25:14.952000+00:00 1 None 1
89 7Ent3V2y bionty.Pathway all go False True Gene Ontology https://data.bioontology.org/ontologies/GO/sub... None http://geneontology.org 1 None 2024-06-17 None 2025-03-10 13:25:14.952000+00:00 1 None 1
94 3rm9aOzL BFXPipeline all lamin False True Bioinformatics Pipeline s3://bionty-assets/df_all__lamin__1.0.0__BFXpi... None https://lamin.ai 1 None 1.0.0 None 2025-03-10 13:25:14.952000+00:00 1 None 1
95 ugaIoIlj Drug all dron False True Drug Ontology https://data.bioontology.org/ontologies/DRON/s... None https://bioportal.bioontology.org/ontologies/DRON 1 None 2024-08-05 None 2025-03-10 13:25:14.952000+00:00 1 None 1
99 1GbFkOdz bionty.DevelopmentalStage human hsapdv False True Human Developmental Stages https://github.com/obophenotype/developmental-... None https://github.com/obophenotype/developmental-... 1 None 2024-05-28 None 2025-03-10 13:25:14.952000+00:00 1 None 1
101 10va5JSt bionty.DevelopmentalStage mouse mmusdv False True Mouse Developmental Stages https://github.com/obophenotype/developmental-... None https://github.com/obophenotype/developmental-... 1 None 2024-05-28 None 2025-03-10 13:25:14.952000+00:00 1 None 1
103 MJRqduf9 bionty.Ethnicity human hancestro False True Human Ancestry Ontology https://github.com/EBISPOT/hancestro/raw/3.0/h... None https://github.com/EBISPOT/hancestro 1 None 3.0 None 2025-03-10 13:25:14.952000+00:00 1 None 1
104 5JnVODh4 BioSample all ncbi False True NCBI BioSample attributes s3://bionty-assets/df_all__ncbi__2023-09__BioS... None https://www.ncbi.nlm.nih.gov/biosample/docs/at... 1 None 2023-09 None 2025-03-10 13:25:14.952000+00:00 1 None 1