scverse/anndata

`concat_on_disk` merge strategies are untested/not implemented

ilan-gold opened this issue · 0 comments

Please describe your wishes and possible alternatives to achieve the desired result.

We should implement them so that they work properly. I am not really sure if this is a bug since concat_on_disk is experimental and reading through the old PR, I don't see any discussion of it or any tests for it.

Here's an MVCE for first with default arguments, although by adding a merge_type argument to the current test suite for concat_on_disk you can see a full list of problems:

from anndata.tests.helpers import (
    assert_equal,
    gen_adata,
)
import anndata as ad
import numpy as np
from scipy import sparse
import pandas as pd

GEN_ADATA_OOC_CONCAT_ARGS = dict(
    obsm_types=(
        sparse.csr_matrix,
        np.ndarray,
        pd.DataFrame,
    ),
    varm_types=(sparse.csr_matrix, np.ndarray, pd.DataFrame),
    layers_types=(sparse.spmatrix, np.ndarray, pd.DataFrame),
)

adata_1 = gen_adata((100, 200), **GEN_ADATA_OOC_CONCAT_ARGS)
adata_2 = gen_adata((50, 60), **GEN_ADATA_OOC_CONCAT_ARGS)
adata_1.write_h5ad('test_1.h5ad')
adata_2.write_h5ad('test_2.h5ad')
ad.experimental.concat_on_disk(['test_1.h5ad', 'test_2.h5ad'], 'merged.h5ad', merge="first")

raises:

IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
Error raised while writing key 'var_cat' of <class 'h5py._hl.group.Group'> to /var

Here's a full list of the tests that fail from test_anndatas_with_reindex when merge is tested:

Errors + tests
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-10-unique] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-h5ad-10-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-h5ad-100000000-unique] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-100000000-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-10-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-h5ad-100000000-unique] - AssertionError: DataFrame are different
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-h5ad-100000000-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-h5ad-10-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-h5ad-10-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-h5ad-100000000-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-zarr-100000000-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-zarr-10-unique] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-zarr-100000000-unique] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-h5ad-10-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-h5ad-100000000-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-zarr-100000000-unique] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-zarr-10-unique] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-h5ad-100000000-unique] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-outer-zarr-100000000-first] - TypeError: expected unicode string, found True