scverse/mudata

Writing updated .X while working in backed mode

Closed this issue · 1 comments

Hello,
A good use-case for loading h5mu objects in backed mode, is to handle/preprocess a single modality, while keeping the other modalities in backed mode. However in the current implementation write_h5mu doesn't allow to re-write a modified/processed .X for a single modality if the MuData object is in backed mode

Example

import mudatasets
import mudata
import muon
import scanpy as sc

## Load and save h5mu file
mdata = mudatasets.load('pbmc10k_multiome')
mdata['rna'].var_names_make_unique()
mdata.write_h5mu('./data/pbmc10k_multiome.h5mu')

## Read in backed mode
mdata = muon.read_h5mu('./data/pbmc10k_multiome.h5mu', backed=True)

## Load RNA in memory and preprocess data
mdata.mod['rna'] = mdata.mod['rna'].to_memory()
sc.pp.normalize_total(mdata.mod['rna'], target_sum=10e4)
sc.pp.log1p(mdata.mod['rna'])

Now the .X of the RNA modality stores the normalized data

mdata.mod['rna'].X.data
array([2.5597956, 2.5597956, 2.5597956, ..., 5.3022   , 3.144047 ,
       6.4688306], dtype=float32)

If I save and reload the .X stores the raw counts

mdata.write_h5mu()
mdata = muon.read_h5mu('./data/pbmc10k_multiome.h5mu', backed=False)
mdata.mod['rna'].X.data
array([ 1.,  1.,  1., ...,  9.,  1., 29.], dtype=float32)

Peeking at the code it looks like the write_h5mu only checks if the full MuData is backed, not if individual modality objects are backed.

mudata/mudata/_core/mudata.py

Lines 1176 to 1181 in 83188a3

if self.isbacked and (filename is None or filename == self.filename):
import h5py
self.file.close()
with h5py.File(self.filename, "a") as f:
_write_h5mu(f, self, write_data=False, **kwargs)

A quick workaround here is to save the processed/modified data matrix in mdata.mod['rna'].layers, but the current behaviour can be confusing. Either a fix or an informative warning (i.e. flagging that since object is in backed mode data matrices are not over-written) would be useful here.

System

  • Python v3.10
  • anndata v0.8.0
  • mudata v0.2.1
gtca commented

Hey @emdann,

Seems like this relates to the ambiguity of backed containers that have parts that are not backed. So far this is not specified and should probably be treated more like undefined behaviour.
This also somewhat relates to scverse/muon#19 adding to the complexity of backed objects in their current version.

The workflow you describe is supported by the current versions of mudata by loading and writing back individual modalities:

adata = mudata.read_h5ad('data/pbmc10k_multiome.h5mu', mod='rna', backed=False)
sc.pp.normalize_total(adata, target_sum=10e4)
sc.pp.log1p(adata)
mudata.write_h5ad('data/pbmc10k_multiome.h5mu', 'rna', adata)

That being said, I tried to add checks that .X is also written when a modality is not backed. So your code should work as you expected now.