lamindb.core.MappedCollection¶
- class lamindb.core.MappedCollection(path_list, layers_keys=None, obs_keys=None, obsm_keys=None, obs_filter=None, join='inner', encode_labels=True, unknown_label=None, cache_categories=True, parallel=False, dtype=None)¶
Bases:
objectMap-style collection for use in data loaders.
This class virtually concatenates
AnnDataarrays as a pytorch map-style dataset.If your
AnnDatacollection is in the cloud, move them into a local cache first for faster access.__getitem__of theMappedCollectionobject takes a single integer index and returns a dictionary with the observation data sample for this index from theAnnDataobjects inpath_list. The dictionary has keys forlayers_keys(.Xis in"X"),obs_keys,obsm_keys(underf"obsm_{key}") and also"_store_idx"for the index of theAnnDataobject containing this observation sample.Note
For a guide, see Train a machine learning model on a collection.
For more convenient use within
MappedCollection, seemapped().This currently only works for collections of
AnnDataobjects.The implementation was influenced by the SCimilarity data loader.
- Parameters:
path_list (
list[lamindb.core.types.UPathStr]) – A list of paths toAnnDataobjects stored in.h5ador.zarrformats.layers_keys (
str|list[str] |None, default:None) – Keys from the.layersslot.layers_keys=Noneor"X"in the list retrieves.X.obsm_keys (
str|list[str] |None, default:None) – Keys from the.obsmslots.obs_keys (
str|list[str] |None, default:None) – Keys from the.obsslots.obs_filter (
dict[str,str|list[str]] |None, default:None) – Select only observations with these values for the given obs columns. Should be a dictionary with obs column names as keys and filtering values (a string or a list of strings) as values.join (
Literal['inner','outer'] |None, default:'inner') –"inner"or"outer"virtual joins. IfNoneis passed, does not join.encode_labels (
bool|list[str], default:True) – Encode labels into integers. Can be a list with elements fromobs_keys.unknown_label (
str|dict[str,str] |None, default:None) – Encode this label to -1. Can be a dictionary with keys fromobs_keysifencode_labels=Trueor fromencode_labelsif it is a list.cache_categories (
bool, default:True) – Enable caching categories ofobs_keysfor faster access.parallel (
bool, default:False) – Enable sampling with multiple processes.dtype (
str|None, default:None) – Convert numpy arrays from.X,.layersand.obsm
Attributes¶
- property closed: bool¶
Check if connections to array streaming backend are closed.
Does not matter if
parallel=True.
- property original_shapes: list[tuple[int, int]]¶
Shapes of the underlying AnnData objects (with
obs_filterapplied).
- property shape: tuple[int, int]¶
Shape of the (virtually aligned) dataset.
Methods¶
- check_vars_non_aligned(vars)¶
Returns indices of objects with non-aligned variables.
- Parameters:
vars (
Index|list) – Check alignment against these variables.- Return type:
list[int]
- check_vars_sorted(ascending=True)¶
Returns
Trueif all variables are sorted in all objects.- Return type:
bool
- close()¶
Close connections to array streaming backend.
No effect if
parallel=True.
- get_label_weights(obs_keys, scaler=None, return_categories=False)¶
Get all weights for the given label keys.
This counts the number of labels for each label and returns weights for each obs label accoding to the formula
1 / num of this label in the data. Ifscaleris provided, thenscaler / (scaler + num of this label in the data).- Parameters:
obs_keys (
str|list[str]) – A key in the.obsslots or a list of keys. If a list is provided, the labels from the obs keys will be concatenated with"__"delimeterscaler (
float|None, default:None) – Use this number to scale the provided weights.return_categories (
bool, default:False) – IfFalse, returns weights for each observation, can be directly passed to a sampler. IfTrue, returns a dictionary with unique categories for labels (concatenated ifobs_keysis a list) and their weights.
- get_merged_categories(label_key)¶
Get merged categories for
label_keyfrom all.obs.
- get_merged_labels(label_key)¶
Get merged labels for
label_keyfrom all.obs.