lamindb.Transform¶

class lamindb.Transform(name: str, key: str | None = None, type: TransformType | None = None, revises: Transform | None = None)¶

Bases: Record, IsVersioned

Data transformations.

A “transform” can refer to a Python function, a script, a notebook, or a pipeline. If you execute a transform, you generate a run (Run). A run has inputs and outputs.

A pipeline is typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, MetaFlow, redun, Airflow, …) and stored in a versioned repository.

Transforms are versioned so that a given transform version maps on a given source code version.

The definition of transforms and runs is consistent the OpenLineage specification where a Transform record would be called a “job” and a Run record a “run”.

Parameters:

name – str A name or title.
key – str | None = None A short name or path-like semantic key.
type – TransformType | None = "pipeline" See TransformType.
revises – Transform | None = None An old version of the transform.

See also

track(): Globally track a script, notebook or pipeline run.
Run: Executions of transforms.

Notes

Examples

Create a transform for a pipeline:

>>> transform = ln.Transform(key="Cell Ranger", version="7.2.0", type="pipeline").save()

Create a transform from a notebook:

>>> ln.track()

View predecessors of a transform:

>>> transform.view_lineage()

Attributes¶

property latest_run: Run¶: The latest run of this transform.

property name: str¶

Name of the transform.

Splits key on / and returns the last element.

property stem_uid: str¶

Universal id characterizing the version family.

The full uid of a record is obtained via concatenating the stem uid and version information:

stem_uid = random_base62(n_char)  # a random base62 sequence of length 12 (transform) or 16 (artifact, collection)
version_uid = "0000"  # an auto-incrementing 4-digit base62 number
uid = f"{stem_uid}{version_uid}"  # concatenate the stem_uid & version_uid

property versions: QuerySet¶

Lists all records of the same version family.

>>> new_artifact = ln.Artifact(df2, revises=artifact).save()
>>> new_artifact.versions()

Simple fields¶

uid: str¶: Universal id.

key: str | None¶

A name or “/”-separated path-like string.

All transforms with the same key are part of the same version family.

description: str | None¶: A description.

type: TransformType¶: TransformType (default "pipeline").

source_code: str | None¶: Source code of the transform.

Changed in version 0.75: The source_code field is no longer an artifact, but a text field.

hash: str | None¶: Hash of the source code.

reference: str | None¶: Reference for the transform, e.g., a URL.

reference_type: str | None¶: Reference type of the transform, e.g., ‘url’.

created_at: datetime¶: Time of creation of record.

updated_at: datetime¶: Time of last update to record.

version: str | None¶

Version (default None).

Defines version of a family of records characterized by the same stem_uid.

Consider using semantic versioning with Python versioning.

is_latest: bool¶: Boolean flag that indicates whether a record is the latest in its version family.

Relational fields¶

space: Space¶: The space in which the record lives.

created_by: User¶: Creator of record.

ulabels: ULabel¶: ULabel annotations of this transform.

predecessors: Transform¶

Preceding transforms.

These are auto-populated whenever an artifact or collection serves as a run input, e.g., artifact.run and artifact.transform get populated & saved.

The table provides a more convenient method to query for the predecessors that bypasses querying the Run.

It also allows to manually add predecessors whose outputs are not tracked in a run.

runs: Run¶: Runs of this transform.

successors: Transform¶

Subsequent transforms.

See predecessors.

references: Reference¶: Linked references.

projects: Project¶: Linked projects.

Class methods¶

classmethod df(include=None, features=False, limit=100)¶

Convert to pd.DataFrame.

By default, shows all direct fields, except updated_at.

Use arguments include or feature to include other data.

Parameters:

include (str | list[str] | None, default: None) – Related fields to include as columns. Takes strings of form "ulabels__name", "cell_types__name", etc. or a list of such strings.
features (bool | list[str], default: False) – If True, map all features of the Feature registry onto the resulting DataFrame. Only available for Artifact.
limit (int, default: 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.

Return type:

DataFrame

Examples

Include the name of the creator in the DataFrame:

>>> ln.ULabel.df(include="created_by__name"])

Include display of features for Artifact:

>>> df = ln.Artifact.df(features=True)
>>> ln.view(df)  # visualize with type annotations

Only include select features:

>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])

classmethod filter(*queries, **expressions)¶

Query records.

Parameters:

queries – One or multiple Q objects.
expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

Returns:

A QuerySet.

See also

Guide: Query & search registries
Django documentation: Queries

Examples

>>> ln.ULabel(name="my label").save()
>>> ln.ULabel.filter(name__startswith="my").df()

classmethod get(idlike=None, **expressions)¶

Get a single record.

Parameters:

idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.
expressions – Fields and values passed as Django query expressions.

Return type:

Record

Returns:

A record.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

See also

Guide: Query & search registries
Django documentation: Queries

Examples

>>> ulabel = ln.ULabel.get("FvtpPJLJ")
>>> ulabel = ln.ULabel.get(name="my-label")

classmethod lookup(field=None, return_field=None)¶

Return an auto-complete object for a field.

Parameters:

field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.
return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> bt.Gene.from_source(symbol="ADGB-DT").save()
>>> lookup = bt.Gene.lookup()
>>> lookup.adgb_dt
>>> lookup_dict = lookup.dict()
>>> lookup_dict['ADGB-DT']
>>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
>>> genes.ensg00000002745
>>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")

classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶

Search.

Parameters:

string (str) – The input string to match against the field ontology values.
field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.
limit (int | None, default: 20) – Maximum amount of top results to return.
case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name")
>>> ln.save(ulabels)
>>> ln.ULabel.search("ULabel2")

classmethod using(instance)¶

Use a non-default LaminDB instance.

Parameters:: instance (str | None) – An instance identifier of form “account_handle/instance_name”.
Return type:: QuerySet

Examples

>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name")
            uid    score
name
ULabel7  g7Hk9b2v  100.0
ULabel5  t4Jm6s0q   75.0
ULabel6  r2Xw8p1z   75.0

Methods¶

delete()¶

Delete.

Return type:: None

save(*args, **kwargs)¶

Save.

Always saves to the default database.

Return type:: Record

view_lineage(with_successors=False, distance=5)¶: View lineage of transforms.