Shape of the output of cnv.tl.infercnv
FlorianBarkmann opened this issue · 1 comments
FlorianBarkmann commented
Hi all,
I noticed that when the window size is greater than the number of genes in a chromosome, the CNV profile for that part of the chromosome has the length of the window size and not the length of the original GEX. This is because the return length of np.convole with mode="same" is the maximum of the two input length.
This breaks the connection between CNVs and genes. Therefore, making it difficult to correctly aggregate CNV profiles from different patients.
Here is a minimal code example:
import anndata
import infercnvpy as cnv
import numpy as np
X = np.ones((4, 25))
adata = anndata.AnnData(X=X)
adata.obs["reference"] = ["reference"] * 2 + ["non-reference"] * 2
adata.var["chromosome"] = ["chr1"] * 25
adata.var["start"] = np.arange(25)
adata.var["end"] = np.arange(25)+1
cnv.tl.infercnv(adata,
reference_cat="reference",
reference_key="reference",
window_size=50,
step=1)
assert adata.obsm["X_cnv"].shape == adata.shape
If this is unintended behavior I am happy to provide a PR with a fix.
Best
Florian
grst commented
Hi Florian,
thanks for the bug report! A fix would be much appreciated!
Best,
Gregor