Shape of the output of cnv.tl.infercnv

Question

Shape of the output of cnv.tl.infercnv

FlorianBarkmann opened this issue 3 years ago · 1 comments

Hi all,

I noticed that when the window size is greater than the number of genes in a chromosome, the CNV profile for that part of the chromosome has the length of the window size and not the length of the original GEX. This is because the return length of np.convole with mode="same" is the maximum of the two input length.

This breaks the connection between CNVs and genes. Therefore, making it difficult to correctly aggregate CNV profiles from different patients.

Here is a minimal code example:

import anndata
import infercnvpy as cnv
import numpy as np

X = np.ones((4, 25))
adata = anndata.AnnData(X=X)

adata.obs["reference"] = ["reference"] * 2 + ["non-reference"] * 2
adata.var["chromosome"] = ["chr1"] * 25
adata.var["start"] = np.arange(25)
adata.var["end"] = np.arange(25)+1


cnv.tl.infercnv(adata,
                reference_cat="reference",
                reference_key="reference",
                window_size=50,
                step=1)


assert adata.obsm["X_cnv"].shape == adata.shape

If this is unintended behavior I am happy to provide a PR with a fix.

Best
Florian

Answer 1 · 2022-03-22T14:54:05.000Z

Hi Florian,

thanks for the bug report! A fix would be much appreciated!

Best,
Gregor