Changelog 2025¶

Note

Get notified by watching releases for git repositories: lamindb, laminhub, laminr, and bionty.

🪜 For other years, see: 2024 · 2023 · 2022

2025-03-09 db 1.2.0¶

✨ Enable to auto-link entities to projects. Guide PR @falexwolf

ln.track(project="My project")

🚸 Better support for spatialdata with Artifact.from_spatialdata() and artifact.load(). PR1 PR2 @Zethson

🚸 Introduce .slots in Schema, Curator, and artifact.features to access schemas and curators by dataset slot. PR @sunnyosun

schema.slots["obs"] -> Schema for .obs slot of AnnData
curator.slots["obs"] -> Curator for .obs slot of AnnData
artifact.features["obs"] -> Feature sets for .obs slot of AnnData

🏗️ Re-structured the internal API away from monkey-patching Django models. PR @falexwolf

🚸 When re-creating an Artifact, link subsequent runs instead of updating .run and linking previous runs. PR @falexwolf

On the hub.

More details here. @chaichontat

Before	After
An artifact is only shown as an output for the latest run that created the artifact. Previous runs don’t show it.	All runs that (re-)create an artifact show it as an output.

More changes:

✨ Support R2 PR @Koncopd
✨ Allow to use Artifact.open() and Artifact.load() for .gz files PR @Koncopd
🐛 Fix passing a path to ln.track() when no path found by nbproject PR @Koncopd
🐛 Do not overwrite ._state_db of records when the current instance is passed to .using PR @Koncopd
🚸 Do not show track warning for read-only connections PR @Koncopd
🚸 Raise NotImplementedError in Artifact.load() if there is no loader PR @Koncopd

2025-02-27 db 1.1.1¶

🚸 Make the obs and var DataFrameCurator objects accessible via AnnDataCurator.slots PR @sunnyosun
🚸 Better error message upon re-creation of schema with same name and different hash PR @falexwolf
🚸 Raise consistency error if a source path suffix doesn’t match the artifact key suffix PR @falexwolf
🚸 Automatically add missing columns upon DataFrameCurator.standardize() if nullable is True PR @falexwolf
🚸 Allow to specify fsspec upload options in Artifact.save PR @Koncopd
🚸 Populate Artifact.n_observations in Artifact.from_df() PR @Koncopd
🐛 Fix UPath.view_tree on first call on gs PR @Koncopd
🐛 Fix .add_new_from message PR @Zethson
🐛 Run pip freeze with current python interpreter PR @ap–
🐛 Do not resolve http links when registering PR @Koncopd
🐛 Fix notebook re-run with same hash PR @falexwolf

2025-02-18 db 1.1.0¶

✨ Conveniently track functions including inputs, outputs, and parameters with a decorator: ln.tracked(). PR1 PR2 @falexwolf

@ln.tracked()
def subset_dataframe(
    input_artifact_key: str,  # all arguments tracked as parameters of the function run
    output_artifact_key: str,
    subset_rows: int = 2,
    subset_cols: int = 2,
) -> None:
    artifact = ln.Artifact.get(key=input_artifact_key)
    df = artifact.load()  # auto-tracked as input
    new_df = df.iloc[:subset_rows, :subset_cols]
    ln.Artifact.from_df(new_df, key=output_artifact_key).save()  # auto-tracked as output

✨ Make sub-types of ULabel, Feature, Schema, Project, Param, and Reference. PR @falexwolf

perturbation = ln.ULabel(name="Perturbation", is_type=True).save()
ln.ULabel(name="DMSO", type=perturbation).save()
ln.ULabel(name="IFNG", type=perturbation).save()

✨ Use an overhauled dataset curation flow. @falexwolf @Zethson @sunnyosun

support persisting validation constraints as a pandera-compatible schema
support validating any feature type, no longer just categoricals
make the relationship between features, dataset schema, and curator evident

Detailed changes for the overhauled curation flow.

⚠️ The API gained the lamindb.curators module as the new way to access Curator classes for different data structures.

This release introduces the schema-based DataFrameCurator and AnnDataCurator
The old-style curation flow for categoricals based on lamindb.Curator.from_objecttype() continues to work

Before	After

Key PRs.

✨ Overhaul curation guides + enable default values and filters on valid categories for features PR @falexwolf
✨ Schema-based curators: AnnDataCurator PR @falexwolf
✨ Schema-based curators: DataFrameCurator PR @falexwolf

Enabling PRs.

✨ Allow passing artifact to Curator PR @sunnyosun
🎨 A ManyToMany between Schema.components and .composites PR @falexwolf
♻️ Mark Schema fields as non-editable PR @falexwolf
✨ Add auxiliary field nullable to Feature PR @falexwolf
♻️ Prettify AnnDataCurator implementation PR @falexwolf
🚸 Better error for malformed categorical dtype PR @falexwolf
🎨 A ManyToMany between Schema.components and .composites PR @falexwolf
🚚 Restore .feature_sets as a ManyToManyField PR @falexwolf
🚚 Rename CatCurator to CatManager PR @falexwolf
🎨 Let Curator.validate() throw an error PR @falexwolf
♻️ Re-purpose BaseCurator as Curator, introduce CatCurator and consolidate shared logic under CatCurator PR @falexwolf
♻️ Refactor organism handling in curators PR @falexwolf
🔥 Eliminate all logic related to using_key in curators PR @falexwolf
🚚 Bulk-rename old-style curators to CatCurator PR @falexwolf
🎨 Self-contained definition of CellxGene schema / validation constraints PR @falexwolf
🚚 Move PertCurator from wetlab here and add CellxGene Curator test PR @falexwolf
🚚 Move CellXGene Curator from cellxgene-lamin here PR @falexwolf

schema = ln.Schema(
    name="small_dataset1_obs_level_metadata",
    features=[
        ln.Feature(name="CD8A", dtype=int).save(),  # integer counts for CD8A marker
        ln.Feature(name="perturbation", dtype=ln.ULabel).save(),  # a categorical feature that validates against the ULabel registry
        ln.Feature(name="sample_note", dtype=str).save(),   # a note for the sample
    ],
).save()

df = pd.DataFrame({
    "CD8A": [1, 4, 0],
    "perturbation": ["DMSO", ],
    "sample_note": ["value_1", "value_2", "value_3"],
    "temperature": [22.2, 25.7, 27.3],
})
curator = ln.curators.DataFrameCurator(df, schema)
artifact = curator.save_artifact(key="example_datasets/dataset1.parquet")  # validates compliance with schema, annotates with metadata
assert artifact.schema == schema  # the validating schema

✨ Easily filter on a validating schema. @falexwolf @Zethson @sunnyosun

schema = ln.Schema.get(name="small_dataset1_obs_level_metadata")  # get a schema
ln.Artifact.filter(schema=schema).df()  # filter all datasets that were validated by the schema

✨ Collection.open() returns a pyarrow dataset. PR @Koncopd

df = pd.DataFrame({"feat1": [0, 0, 1, 1], "feat2": [6, 7, 8, 9]})
df[:2].to_parquet("df1.parquet", engine="pyarrow")
df[2:].to_parquet("df2.parquet", engine="pyarrow")

artifact1 = ln.Artifact(shard1, key="df1.parquet").save()
artifact2 = ln.Artifact(shard2, key="df2.parquet").save()
collection = ln.Collection([artifact1, artifact2], key="parquet_col")

dataset = collection.open() # backed by files in the cloud storage
dataset.to_table().to_pandas().head()

✨ Support s3-compatible endpoint urls, say your on-prem MinIO deployment. PR @Koncopd

Speed up instance creation through squashed migrations.

⚡ Squash migrations PR1 PR2 @falexwolf

Tiledbsoma.

✨ Support endpoint_url in operations with tiledbsoma PR1 PR2 @Koncopd
✨ Add Artifact.from_tiledbsoma to populate n_observations PR @Koncopd

MappedCollection.

🐛 Allow filtering on np.nan in obs_filter of MappedCollection PR @Koncopd
🐛 Fix labels for NaN in categorical columns for MappedCollection PR @Koncopd

SpatialDataCurator.

🐛 Fix var_index standardization of SpatialDataCurator PR1 PR2 @Zethson
🐛 Fix sample level metadata optional in SpatialDataCatManager PR @Zethson

Core functionality.

✨ Allow to check the need for syncing without actually syncing PR @Koncopd
✨ Check for corrupted cache in Artifact.load() & Artifact.open() PR PR @Koncopd
✨ Infer n_observations in Artifact.from_anndata PR @Koncopd
🐛 Account for VSCode appending languageid to markdown cell in notebook tracking PR @falexwolf
🐛 Fix dangling folders on upload failures PR @Koncopd
🐛 Normalize module names for robust checking in _check_instance_setup() PR @Koncopd
🐛 Fix idempotency of Feature creation when description is passed and improve filter and get error behavior PR @Zethson
🐛 Fix caching logic in Artifact.open() PR @Koncopd
🚸 Make new version upon passing existing key to Collection PR @falexwolf
🚸 Throw better error upon checking instance.modules when loading a lamindb schema module PR @Koncopd
🚸 Validate existing records in the DB irrespective of whether an ontology source is passed or not PR @sunnyosun
🚸 Full guarantee of avoiding duplicating Transform, Artifact & Collection in concurrent runs PR @falexwolf
🚸 Fix RemovedInDjango60Warning PR @Zethson
🚸 Better user feedback during keyword validation in Record constructor PR @Zethson
🚸 Fix warning about artifacts in trash PR @ap–
🚸 Improved error message when saving via CLI PR @Zethson
🚸 Improve local storage not found warning message PR @Zethson
🚸 Better error message when attempting to save a file while not being connected to an instance PR @Zethson
🚸 Error for non-keyword parameters for Artifact.from_x methods PR @Zethson

Housekeeping.

🚸 Error at runtime with old s3fs PR @Koncopd
🚸 Safer resolve in check_path_is_child_of_root() PR @Koncopd
⬆️ Upgrade fsspec packages (s3fs, gcsfs, universal_pathlib) PR @Koncopd
➕ Add pyyaml to dependencies PR @Koncopd

2025-01-23 db 1.0.5¶

🚸 No longer throw a NotebookNotSaved error in ln.finish() but wait for the user or gracefully exit PR @falexwolf
🚸 Resolve save FutureWarning PR @Zethson
🐛 Fix Artifact.replace() for folder-like artifacts PR @Koncopd
🐛 Filter the latest transform on saving by filename PR @Koncopd

2025-01-21 db 1.0.4¶

🚚 Revert Collection.description back to unlimited length TextField. PR @falexwolf

2025-01-21 db 1.0.3¶

🚸 In track(), improve logging in RStudio sessions. PR @falexwolf

2025-01-20 R 0.4.0¶

🚚 Migrate to lamindb v1 PR @falexwolf
🚸 Improve the user experience for setting up Python & reticulate PR @lazappi

2025-01-20 db 1.0.2¶

🚚 Improvments for lamindb v1 migrations. PR @falexwolf

add a .description field to Schema
enable labeling Run with ULabel
add a .predecessors and .successors field to Project akin to what’s present on Transform
make .uid fields not editable

2025-01-18 db 1.0.1¶

🐛 Block non-admin users from confirming the dialogue for integrating lnschema-core. PR @falexwolf

2025-01-17 db 1.0.0¶

This release makes the API consistent, integrates lnschema_core & ourprojects into the lamindb package, and introduces a breadth of database migrations to enable future features without disruption. You’ll now need at least Python 3.10.

Your code will continue to run as is, but you will receive warnings about a few renamed API components.

What	Before	After
Dataset vs. model	`Artifact.type`	`Artifact.kind`
Python object for `Artifact`	`Artifact._accessor`	`Artifact.otype`
Number of files	`Artifact.n_objects`	`Artifact.n_files`
`name` arg of `Transform`	`Transform(name="My notebook", key="my-notebook.ipynb")`	`Transform(key="my-notebook.ipynb", description="My notebook")`
`name` arg of `Collection`	`Collection(name="My collection")`	`Collection(key="My collection")`
Consecutiveness field	`Run.is_consecutive`	`Run._is_consecutive`
Run initiator	`Run.parent`	`Run.initiated_by_run`
`--schema` arg	`lamin init --schema bionty,wetlab`	`lamin init --modules bionty,wetlab`

Migration guide:

Upon lamin connect account/instance you will be prompted to confirm migrating away from lnschema_core
After that, you will be prompted to call lamin migrate deploy to apply database migrations

New features:

✨ Allow http storage backend for Artifact PR @Koncopd
✨ Add SpatialDataCurator PR @Zethson
✨ Allow filtering by multiple obs columns in MappedCollection PR @Koncopd
✨ In git sync, also search git blob hash in non-default branches PR @Zethson
✨ Add relationship with Project to everything except Run, Storage & User so that you can easily filter for the entities relevant to your project PR @falexwolf
✨ Capture logs of scripts during ln.track() PR1 PR2 @falexwolf @Koncopd
✨ Support "|"-seperated multi-values in Curator PR @sunnyosun
🚸 Accept None in connect() and improve migration dialogue PR @falexwolf

UX improvements:

🚸 Simplify the ln.track() experience PR @falexwolf
1. you can omit the uid argument
2. you can organize transforms in folders
3. versioning is fully automated (requirement for 1.)
4. you can save scripts and notebooks without running them (corollary of 1.)
5. you avoid the interactive prompt in a notebook and the throwing of an error in a script (corollary of 1.)
6. you are no longer required to add a title in a notebook
🚸 Raise error when modifying Artifact.key in problematic ways PR1 PR2 @sunnyosun @Koncopd
🚸 Better error message on running ln.track() within Python terminal PR @Koncopd
🚸 Hide traceback for InstanceNotEmpty using Click Exception PR @Zethson
🚸 Hide underscore attributes in __repr__ PR @Zethson
🚸 Only auto-search ._name_field in sub-classes of CanCurate PR @falexwolf
🚸 Simplify installation & API overview PR @falexwolf
🚸 Make lamin_run_uid categorical in tiledbsoma stores PR @Koncopd
🚸 Add defensive check for organism arg PR @Zethson
🚸 Raise ValueError when trying to search a None value PR @Zethson

Bug fixes:

🐛 Skip deleting storage when deleting outdated versions of folder-like artifacts PR @Koncopd
🐛 Let SOMACurator() validate and annotate all .obs columns PR @falexwolf
🐛 Fix renaming of feature sets PR @sunnyosun
🐛 Do not raise an exception when default AWS credentials fail PR @Koncopd
🐛 Only map synonyms when field is name PR @sunnyosun
🐛 Fix source in .from_values PR @sunnyosun
🐛 Fix creating instances with storage in the current local working directory PR @Koncopd
🐛 Fix NA values in Curator.add_new_from() PR @sunnyosun

Refactors, renames & maintenance:

🏗️ Integrate lnschema-core into lamindb PR1 PR2 @falexwolf @Koncopd
🏗️ Integrate ourprojects into lamindb PR @falexwolf
♻️ Manage created_at, updated_at on the database-level, make created_by not editable PR @falexwolf
🚚 Rename transform type “glue” to “linker” PR @falexwolf
🚚 Deprecate the --schema argument of lamin init in favor of --modules PR @falexwolf
⬆️ Compatibility with tiledbsoma==1.15.0 PR @Koncopd

DevOps:

👷 Isolate curator tests PR @Zethson

Detailed list of database migrations

Those not yet announced above will be announced with the functionality they enable.

♻️ Add contenttypes Django plugin PR @falexwolf
🚚 Prepare introduction of persistable Curator objects by renaming FeatureSet to Schema on the database-level PR @falexwolf
🚚 Add a .type foreign key to ULabel, Feature, FeatureSet, Reference, Param PR @falexwolf
🚚 Introduce RunData, TidyTable, and TidyTableData in the database PR @falexwolf

All remaining database schema changes were made in this PR @falexwolf. Data migrations happen automatically.

remove _source_code_artifact from Transform, it’s been deprecated since 0.75
- data migration: for all transforms that have _source_code_artifact populated, populate source_code
rename Transform.name to Transform.description because it’s analogous to Artifact.description
- backward compat:
  - in the Transform constructor use name to populate key in all cases in which only name is passed
  - return the same transform based on key in case source_code is None via ._name_field = "key"
- data migrations:
  - there already was a legacy description field that was never exposed on the constructor; to be safe, we concatenated potential data in it on the new description field
  - for all transforms that have key=None and name!=None, use name to pre-populate key
rename Collection.name to Collection.key for consistency with Artifact & Transform and the high likelihood of you wanting to organize them hierarchically
a _branch_code integer on every record to model pull requests
- include visibility within that code
- repurpose visibility=0 as _branch_code=0 as “archive”
- put an index on it
- code a “draft” as _branch_code = 2, and “draft prs” as negative branch codes
rename values "number" to "num" in dtype
an ._aux json field on Record
a SmallInteger run._status_code that allows to write finished_at in clean up operations so that there is a run time also for aborted runs
rename Run.is_consecutive to Run._is_consecutive
a _template_id FK to store the information of the generating template (whether a record is a template is coded via _branch_code)
rename _accessor to otype to publicly declare the data format as suffix, accessor
rename Artifact.type to Artifact.kind
a FK to artifact run._logfile which holds logs
a hash field on ParamValue and FeatureValue to enforce uniqueness without running the danger of failure for large dictionaries
add a boolean field ._expect_many to Feature/Param that defaults to True/False and indicates whether values for this feature/param are expected to occur a single or multiple times for every single artifact/run
- for feature
  - if it’s True (default), the values come from an observation-level aggregation and a dtype of datetime on the observation-level mean set[datetime] on the artifact-level
  - if it’s False it’s an artifact-level value and datetime means datetime; this is an edge case because an arbitrary artifact would always be a set of arbitrary measurements that would need to be aggregated (“one just happens to measure a single cell line in that artifact”)
- for param
  - if it’s False (default), the values mean artifact/run-level values and datetime means datetime
  - if it’s True, the values would be from an aggregation, this seems like an edge case but say when characterizing a model ensemble trained with different parameters it could be relevant
remove the .transform foreign key from artifact and collection for consistency with all other records; introduce a property and a simple filter statement instead that maintains the same UX
store provenance metadata for TransformULabel, RunParamValue, ArtifactParamValue
enable linking projects & references to transforms & collections
rename Run.parent to Run.initiated_by_run
introduce a boolean flag on artifact that’s called _overwrite_versions, which indicates whether versions are overwritten or stored separately; it defaults to False for file-like artifacts and to True for folder-like artifacts
Rename n_objects to n_files for more clarity
Add a Space registry to lamindb with an FK on every BasicRecord
add a name column to Run so that a specific run can be used as a named specific analysis
remove _previous_runs field on everything except Artifact & Collection