Coerce non-integer shapes into integers in the `H5SparseMatrix` constructor
LTLA opened this issue · 2 comments
Using this file as an example:
library(HDF5Array)
H5SparseMatrix("pbmc4k-tenx.h5", "matrix")
## <33694 x 4340> sparse matrix of class H5SparseMatrix and type "integer":
## etc. etc. looks fine.
However, it seems like there are many files where the shape
is saved as a Uint64. This causes problems in some of the H5SparseMatrixSeed
constructors where the HDF5Array C code reads them as double
s. To reproduce, we can replace the shape
dataset with its Uint64 counterpart (this requires h5py as I can't figure out how to do that with rhdf5):
import shutil
src = "pbmc4k-tenx.h5"
dest = "promoted.h5"
shutil.copyfile(src, dest)
import h5py
import numpy
with h5py.File(dest, "a") as handle:
mhandle = handle["matrix"]
dims = mhandle["shape"][:]
del mhandle["shape"]
promoted = dims.astype(numpy.uint64)
mhandle.create_dataset("shape", data = promoted)
And then:
H5SparseMatrix("promoted.h5", "matrix")
## Error in validObject(.Object) :
## invalid class “CSC_H5SparseMatrixSeed” object: invalid object for slot "dim" in class "CSC_H5SparseMatrixSeed": got class "numeric", should be or extend class "integer"
Some testing suggests that just setting as.integer=TRUE
in the read_h5sparse_component
call in .read_h5sparse_dim
would be sufficient to get the example above working.
Session information
R Under development (unstable) (2022-02-11 r81718)
Platform: x86_64-apple-darwin19.6.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /Users/luna/Software/R/trunk/lib/libRblas.dylib
LAPACK: /Users/luna/Software/R/trunk/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] HDF5Array_1.25.0 rhdf5_2.39.6 DelayedArray_0.21.2
[4] IRanges_2.29.1 S4Vectors_0.33.15 MatrixGenerics_1.7.0
[7] matrixStats_0.61.0 BiocGenerics_0.41.2 Matrix_1.4-1
loaded via a namespace (and not attached):
[1] compiler_4.2.0 tools_4.2.0 rhdf5filters_1.7.0 grid_4.2.0
[5] lattice_0.20-45 Rhdf5lib_1.17.3
I had the same issue (when trying to import h5
files saved by CellBender
).
In case it's useful to anyone else, my workaround was to apply this function (the same as @LTLA's but reversed) to all the cellbender
output h5 files. The fixed version could then be loaded by DropletUtils::read10xCounts
.
def fix_cellbender_h5(s, bender_dir):
# copy file
src = os.path.join(bender_dir, f"cellbender_{s}_filtered.h5")
dest = os.path.join(bender_dir, f"cellbender_{s}_filtered_fixed.h5")
shutil.copyfile(src, dest)
# fix shape integers
with h5py.File(dest, "a") as handle:
mat_handle = handle["matrix"]
dims = mat_handle["shape"][:]
del mat_handle["shape"]
dims_fixed = dims.astype(numpy.intc)
mat_handle.create_dataset("shape", data = dims_fixed)
Thanks for the report. Should be fixed in HDF5Array 1.24.1 (release) and 1.25.1 (devel).
H.