Changelog 2025ยถ
2025-03-09 db 1.2.0ยถ
โจ Enable to auto-link entities to projects. Guide PR @falexwolf
ln.track(project="My project")
๐ธ Better support for spatialdata
with Artifact.from_spatialdata()
and artifact.load()
. PR1 PR2 @Zethson
๐ธ Introduce .slots
in Schema
, Curator
, and artifact.features
to access schemas and curators by dataset slot. PR @sunnyosun
schema.slots["obs"] -> Schema for .obs slot of AnnData
curator.slots["obs"] -> Curator for .obs slot of AnnData
artifact.features["obs"] -> Feature sets for .obs slot of AnnData
๐๏ธ Re-structured the internal API away from monkey-patching Django models. PR @falexwolf
โ ๏ธ Use of internal API
If you used the internal API, you might experience a breaking change. The most drastic change is that all internal registry-related functionality is now re-exported under lamindb.models
.
๐ธ When re-creating an Artifact
, link subsequent runs instead of updating .run
and linking previous runs. PR @falexwolf
On the hub.
More details here. @chaichontat
Before |
After |
---|---|
An artifact is only shown as an output for the latest run that created the artifact. Previous runs donโt show it. |
All runs that (re-)create an artifact show it as an output. |
More changes:
โจ Allow to use
Artifact.open()
andArtifact.load()
for.gz
files PR @Koncopd๐ Fix passing a path to
ln.track()
when no path found bynbproject
PR @Koncopd๐ Do not overwrite
._state_db
of records when the current instance is passed to.using
PR @Koncopd๐ธ Do not show track warning for read-only connections PR @Koncopd
๐ธ Raise
NotImplementedError
inArtifact.load()
if there is no loader PR @Koncopd
2025-02-27 db 1.1.1ยถ
๐ธ Make the
obs
andvar
DataFrameCurator
objects accessible viaAnnDataCurator.slots
PR @sunnyosun๐ธ Better error message upon re-creation of schema with same name and different hash PR @falexwolf
๐ธ Raise consistency error if a source path suffix doesnโt match the artifact
key
suffix PR @falexwolf๐ธ Automatically add missing columns upon
DataFrameCurator.standardize()
ifnullable
isTrue
PR @falexwolf๐ธ Allow to specify
fsspec
upload options inArtifact.save
PR @Koncopd๐ธ Populate
Artifact.n_observations
inArtifact.from_df()
PR @Koncopd๐ Run
pip freeze
with current python interpreter PR @apโ๐ Fix notebook re-run with same hash PR @falexwolf
2025-02-18 db 1.1.0ยถ
โ ๏ธ The FeatureSet
registry got renamed to Schema
.
All your code is backward compatible. The Schema
registry encompasses feature sets as a special case.
โจ Conveniently track functions including inputs, outputs, and parameters with a decorator: ln.tracked()
. PR1 PR2 @falexwolf
@ln.tracked()
def subset_dataframe(
input_artifact_key: str, # all arguments tracked as parameters of the function run
output_artifact_key: str,
subset_rows: int = 2,
subset_cols: int = 2,
) -> None:
artifact = ln.Artifact.get(key=input_artifact_key)
df = artifact.load() # auto-tracked as input
new_df = df.iloc[:subset_rows, :subset_cols]
ln.Artifact.from_df(new_df, key=output_artifact_key).save() # auto-tracked as output
โจ Make sub-types of ULabel
, Feature
, Schema
, Project
, Param
, and Reference
. PR @falexwolf
On the hub.
More details here. @awgaan @chaichontat
Before |
After |
---|---|
perturbation = ln.ULabel(name="Perturbation", is_type=True).save()
ln.ULabel(name="DMSO", type=perturbation).save()
ln.ULabel(name="IFNG", type=perturbation).save()
โจ Use an overhauled dataset curation flow. @falexwolf @Zethson @sunnyosun
support persisting validation constraints as a
pandera
-compatible schemasupport validating any feature type, no longer just categoricals
make the relationship between features, dataset schema, and curator evident
Detailed changes for the overhauled curation flow.
โ ๏ธ The API gained the lamindb.curators
module as the new way to access Curator
classes for different data structures.
This release introduces the schema-based
DataFrameCurator
andAnnDataCurator
The old-style curation flow for categoricals based on
lamindb.Curator.from_objecttype()
continues to work
Before |
After |
---|---|
Key PRs.
โจ Overhaul curation guides + enable default values and filters on valid categories for features PR @falexwolf
โจ Schema-based curators:
AnnDataCurator
PR @falexwolfโจ Schema-based curators:
DataFrameCurator
PR @falexwolf
Enabling PRs.
โจ Allow passing
artifact
toCurator
PR @sunnyosun๐จ A
ManyToMany
betweenSchema.components
and.composites
PR @falexwolfโป๏ธ Mark
Schema
fields as non-editable PR @falexwolfโจ Add auxiliary field
nullable
toFeature
PR @falexwolfโป๏ธ Prettify
AnnDataCurator
implementation PR @falexwolf๐ธ Better error for malformed categorical dtype PR @falexwolf
๐จ A
ManyToMany
betweenSchema.components
and.composites
PR @falexwolf๐ Restore
.feature_sets
as aManyToManyField
PR @falexwolf๐ Rename
CatCurator
toCatManager
PR @falexwolf๐จ Let
Curator.validate()
throw an error PR @falexwolfโป๏ธ Re-purpose
BaseCurator
asCurator
, introduceCatCurator
and consolidate shared logic underCatCurator
PR @falexwolfโป๏ธ Refactor
organism
handling in curators PR @falexwolf๐ฅ Eliminate all logic related to
using_key
in curators PR @falexwolf๐ Bulk-rename old-style curators to
CatCurator
PR @falexwolf๐จ Self-contained definition of
CellxGene
schema / validation constraints PR @falexwolf๐ Move
PertCurator
fromwetlab
here and addCellxGene
Curator
test PR @falexwolf๐ Move CellXGene
Curator
fromcellxgene-lamin
here PR @falexwolf
schema = ln.Schema(
name="small_dataset1_obs_level_metadata",
features=[
ln.Feature(name="CD8A", dtype=int).save(), # integer counts for CD8A marker
ln.Feature(name="perturbation", dtype=ln.ULabel).save(), # a categorical feature that validates against the ULabel registry
ln.Feature(name="sample_note", dtype=str).save(), # a note for the sample
],
).save()
df = pd.DataFrame({
"CD8A": [1, 4, 0],
"perturbation": ["DMSO", ],
"sample_note": ["value_1", "value_2", "value_3"],
"temperature": [22.2, 25.7, 27.3],
})
curator = ln.curators.DataFrameCurator(df, schema)
artifact = curator.save_artifact(key="example_datasets/dataset1.parquet") # validates compliance with schema, annotates with metadata
assert artifact.schema == schema # the validating schema
โจ Easily filter on a validating schema. @falexwolf @Zethson @sunnyosun
On the hub.
With the Schema
filter button, find all datasets that satisfy a given schema (โ explore).
schema = ln.Schema.get(name="small_dataset1_obs_level_metadata") # get a schema
ln.Artifact.filter(schema=schema).df() # filter all datasets that were validated by the schema
โจ Collection.open()
returns a pyarrow
dataset. PR @Koncopd
df = pd.DataFrame({"feat1": [0, 0, 1, 1], "feat2": [6, 7, 8, 9]})
df[:2].to_parquet("df1.parquet", engine="pyarrow")
df[2:].to_parquet("df2.parquet", engine="pyarrow")
artifact1 = ln.Artifact(shard1, key="df1.parquet").save()
artifact2 = ln.Artifact(shard2, key="df2.parquet").save()
collection = ln.Collection([artifact1, artifact2], key="parquet_col")
dataset = collection.open() # backed by files in the cloud storage
dataset.to_table().to_pandas().head()
โจ Support s3-compatible endpoint urls, say your on-prem MinIO deployment. PR @Koncopd
Speed up instance creation through squashed migrations.
โก Squash migrations PR1 PR2 @falexwolf
Tiledbsoma.
โจ Support
endpoint_url
in operations with tiledbsoma PR1 PR2 @Koncopdโจ Add
Artifact.from_tiledbsoma
to populaten_observations
PR @Koncopd
MappedCollection.
๐ Allow filtering on
np.nan
inobs_filter
ofMappedCollection
PR @Koncopd๐ Fix labels for
NaN
in categorical columns forMappedCollection
PR @Koncopd
SpatialDataCurator.
๐ Fix
var_index
standardization ofSpatialDataCurator
PR1 PR2 @Zethson๐ Fix sample level metadata optional in
SpatialDataCatManager
PR @Zethson
Core functionality.
โจ Allow to check the need for syncing without actually syncing PR @Koncopd
โจ Check for corrupted cache in
Artifact.load()
&Artifact.open()
PR PR @Koncopdโจ Infer
n_observations
inArtifact.from_anndata
PR @Koncopd๐ Account for VSCode appending languageid to markdown cell in notebook tracking PR @falexwolf
๐ Normalize module names for robust checking in
_check_instance_setup()
PR @Koncopd๐ Fix idempotency of
Feature
creation whendescription
is passed and improve filter and get error behavior PR @Zethson๐ธ Make new version upon passing existing
key
toCollection
PR @falexwolf๐ธ Throw better error upon checking
instance.modules
when loading a lamindb schema module PR @Koncopd๐ธ Validate existing records in the DB irrespective of whether an ontology
source
is passed or not PR @sunnyosun๐ธ Full guarantee of avoiding duplicating
Transform
,Artifact
&Collection
in concurrent runs PR @falexwolf๐ธ Better user feedback during keyword validation in
Record
constructor PR @Zethson๐ธ Improve local storage not found warning message PR @Zethson
๐ธ Better error message when attempting to save a file while not being connected to an instance PR @Zethson
๐ธ Error for non-keyword parameters for
Artifact.from_x
methods PR @Zethson
Housekeeping.
2025-01-23 db 1.0.5ยถ
2025-01-21 db 1.0.4ยถ
๐ Revert Collection.description
back to unlimited length TextField
. PR @falexwolf
2025-01-21 db 1.0.3ยถ
๐ธ In track()
, improve logging in RStudio sessions. PR @falexwolf
2025-01-20 R 0.4.0ยถ
๐ Migrate to lamindb v1 PR @falexwolf
๐ธ Improve the user experience for setting up Python & reticulate PR @lazappi
2025-01-20 db 1.0.2ยถ
๐ Improvments for lamindb v1 migrations. PR @falexwolf
add a
.description
field toSchema
enable labeling
Run
withULabel
add a
.predecessors
and.successors
field toProject
akin to whatโs present onTransform
make
.uid
fields not editable
2025-01-18 db 1.0.1ยถ
๐ Block non-admin users from confirming the dialogue for integrating lnschema-core
. PR @falexwolf
2025-01-17 db 1.0.0ยถ
This release makes the API consistent, integrates lnschema_core
& ourprojects
into the lamindb
package, and introduces a breadth of database migrations to enable future features without disruption. Youโll now need at least Python 3.10.
Your code will continue to run as is, but you will receive warnings about a few renamed API components.
What |
Before |
After |
---|---|---|
Dataset vs. model |
|
|
Python object for |
|
|
Number of files |
|
|
|
|
|
|
|
|
Consecutiveness field |
|
|
Run initiator |
|
|
|
|
|
Migration guide:
Upon
lamin connect account/instance
you will be prompted to confirm migrating away fromlnschema_core
After that, you will be prompted to call
lamin migrate deploy
to apply database migrations
New features:
โจ Allow filtering by multiple
obs
columns inMappedCollection
PR @Koncopdโจ In git sync, also search git blob hash in non-default branches PR @Zethson
โจ Add relationship with
Project
to everything exceptRun
,Storage
&User
so that you can easily filter for the entities relevant to your project PR @falexwolfโจ Capture logs of scripts during
ln.track()
PR1 PR2 @falexwolf @Koncopdโจ Support
"|"
-seperated multi-values inCurator
PR @sunnyosun๐ธ Accept
None
inconnect()
and improve migration dialogue PR @falexwolf
UX improvements:
๐ธ Simplify the
ln.track()
experience PR @falexwolfyou can omit the
uid
argumentyou can organize transforms in folders
versioning is fully automated (requirement for 1.)
you can save scripts and notebooks without running them (corollary of 1.)
you avoid the interactive prompt in a notebook and the throwing of an error in a script (corollary of 1.)
you are no longer required to add a title in a notebook
๐ธ Raise error when modifying
Artifact.key
in problematic ways PR1 PR2 @sunnyosun @Koncopd๐ธ Better error message on running
ln.track()
within Python terminal PR @Koncopd๐ธ Hide traceback for
InstanceNotEmpty
using Click Exception PR @Zethson๐ธ Only auto-search
._name_field
in sub-classes ofCanCurate
PR @falexwolf๐ธ Simplify installation & API overview PR @falexwolf
๐ธ Make
lamin_run_uid
categorical in tiledbsoma stores PR @Koncopd๐ธ Raise
ValueError
when trying to search aNone
value PR @Zethson
Bug fixes:
๐ Skip deleting storage when deleting outdated versions of folder-like artifacts PR @Koncopd
๐ Let
SOMACurator()
validate and annotate all.obs
columns PR @falexwolf๐ Fix renaming of feature sets PR @sunnyosun
๐ Do not raise an exception when default AWS credentials fail PR @Koncopd
๐ Only map synonyms when field is name PR @sunnyosun
๐ Fix
source
in.from_values
PR @sunnyosun๐ Fix creating instances with storage in the current local working directory PR @Koncopd
๐ Fix NA values in
Curator.add_new_from()
PR @sunnyosun
Refactors, renames & maintenance:
๐๏ธ Integrate
lnschema-core
intolamindb
PR1 PR2 @falexwolf @Koncopd๐๏ธ Integrate
ourprojects
into lamindb PR @falexwolfโป๏ธ Manage
created_at
,updated_at
on the database-level, makecreated_by
not editable PR @falexwolf๐ Rename transform type โglueโ to โlinkerโ PR @falexwolf
๐ Deprecate the
--schema
argument oflamin init
in favor of--modules
PR @falexwolf
DevOps:
Detailed list of database migrations
Those not yet announced above will be announced with the functionality they enable.
โป๏ธ Add
contenttypes
Django plugin PR @falexwolf๐ Prepare introduction of persistable
Curator
objects by renamingFeatureSet
toSchema
on the database-level PR @falexwolf๐ Add a
.type
foreign key toULabel
,Feature
,FeatureSet
,Reference
,Param
PR @falexwolf๐ Introduce
RunData
,TidyTable
, andTidyTableData
in the database PR @falexwolf
All remaining database schema changes were made in this PR @falexwolf. Data migrations happen automatically.
remove
_source_code_artifact
from Transform, itโs been deprecated since 0.75data migration: for all transforms that have
_source_code_artifact
populated, populatesource_code
rename
Transform.name
toTransform.description
because itโs analogous toArtifact.description
backward compat:
in the
Transform
constructor usename
to populatekey
in all cases in which onlyname
is passedreturn the same transform based on
key
in casesource_code is None
via._name_field = "key"
data migrations:
there already was a legacy
description
field that was never exposed on the constructor; to be safe, we concatenated potential data in it on the new description fieldfor all transforms that have
key=None
andname!=None
, usename
to pre-populatekey
rename
Collection.name
toCollection.key
for consistency withArtifact
&Transform
and the high likelihood of you wanting to organize them hierarchicallya
_branch_code
integer on every record to model pull requestsinclude
visibility
within that coderepurpose
visibility=0
as_branch_code=0
as โarchiveโput an index on it
code a โdraftโ as _branch_code = 2, and โdraft prsโ as negative branch codes
rename values
"number"
to"num"
in dtypean
._aux
json field onRecord
a SmallInteger
run._status_code
that allows to writefinished_at
in clean up operations so that there is a run time also for aborted runsrename
Run.is_consecutive
toRun._is_consecutive
a
_template_id
FK to store the information of the generating template (whether a record is a template is coded via _branch_code)rename
_accessor
tootype
to publicly declare the data format assuffix, accessor
rename
Artifact.type
toArtifact.kind
a FK to artifact
run._logfile
which holds logsa
hash
field onParamValue
andFeatureValue
to enforce uniqueness without running the danger of failure for large dictionariesadd a boolean field
._expect_many
toFeature
/Param
that defaults toTrue
/False
and indicates whether values for this feature/param are expected to occur a single or multiple times for every single artifact/runfor feature
if itโs
True
(default), the values come from an observation-level aggregation and a dtype ofdatetime
on the observation-level meanset[datetime]
on the artifact-levelif itโs
False
itโs an artifact-level value anddatetime
meansdatetime
; this is an edge case because an arbitrary artifact would always be a set of arbitrary measurements that would need to be aggregated (โone just happens to measure a single cell line in that artifactโ)
for param
if itโs
False
(default), the values mean artifact/run-level values anddatetime
meansdatetime
if itโs
True
, the values would be from an aggregation, this seems like an edge case but say when characterizing a model ensemble trained with different parameters it could be relevant
remove the
.transform
foreign key from artifact and collection for consistency with all other records; introduce a property and a simple filter statement instead that maintains the same UXstore provenance metadata for
TransformULabel
,RunParamValue
,ArtifactParamValue
enable linking projects & references to transforms & collections
rename
Run.parent
toRun.initiated_by_run
introduce a boolean flag on artifact thatโs called
_overwrite_versions
, which indicates whether versions are overwritten or stored separately; it defaults toFalse
for file-like artifacts and toTrue
for folder-like artifactsRename
n_objects
ton_files
for more clarityAdd a
Space
registry to lamindb with an FK on everyBasicRecord
add a name column to
Run
so that a specific run can be used as a named specific analysisremove
_previous_runs
field on everything exceptArtifact
&Collection