saezlab/progeny

progenyPerm - example is for progeny, not progenyPerm

Closed this issue · 6 comments

In progeny v1.10.0, in R 4.0.3/RStudio, the help docs example for progenyPerm has this -
gene_expression <- ...
progeny(gene_expression, scale=TRUE, organism="Human", top=100, perm=10000)

Would you please update this to show a progenyPerm example?

Ah thanks for noticing, progenyPerm shouldn't actually be exported as a function anymore, it is actually used internally when a user uses the progeny function with perm parameter >=2.

So if you with to run progeny with permutations to normalise the scores, which you likely should in default scenarios, simply put 1000 or 10000 in the perm parameter.

We will update the package shortly to correct this.

Does this answers your question ?

I would like to use PROGENy, but would prefer that perm be fixed, if it's actually not working. Could I ask what timeline you see for checking perm, and fixing it, if it's not working? Thanks -

Hello,

You should not have duplicated rownames in your input matrix for Progeny. This means that you have duplicated genes. You have to figure out what to do with these duplicated genes (like computing their average expression) before running Progeny with or without Permutations. Otherwise, these duplicated genes may contribute twice to Progeny pathway activity scores.

ProgenyPermutations does not need to be fixed in that sense. Indeed, we need to control in Progeny without permutations that no duplicated genes are provided as input.

Best regards,
Alberto.

If I follow the vignette code, I get many duplicated rownames in gene_expr.
...

library(biomaRt)
mart = useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes = getBM(attributes = c("ensembl_gene_id","hgnc_symbol"),
values=rownames(gene_expr), mart=mart)
matched = match(rownames(gene_expr), genes$ensembl_gene_id)
rownames(gene_expr) = genes$hgnc_symbol[matched]

check for unique rownames()
rev(sort(table(rownames(gene_expr))))[1:10]
( ZNRD1ASP ZFP57 UBDP1 TUBB TSBP1 TRIM40 TRIM31-AS1 TRIM31 TRIM26)
(15614 8 8 8 8 8 8 8 8 8 )

Guessing - we can assign duplicate rownames to gene_expr because it's not a data.frame.

I find I can run this gene_expr in progeny if I do NOT use 'perm' -
library(progeny)
pathways <- progeny(
gene_expr,
scale=FALSE
)
...but if I try to set 'perm' I get a duplicated rownames error.
pathways <- progeny(
gene_expr,
scale=FALSE,
perm = 100
)
Error in .rowNamesDF<-(x, value = value) :
duplicate 'row.names' are not allowed
...

If I leave the vignette and run progeny on a data.frame that has FPKMs for ~19+k coding genes and ~80 tumour samples, ensuring that rownames are unique, I can set perm, or not. E.g.
pathways <- progeny(
as.matrix(fpkm.df),
scale=FALSE,
perm = 10000,
top = 200
)

The above suggests that there may be a problem with the vignette, as 'gene_exp' has duplicated rownames. Should progeny() run when 'perm' is not set but there's duplicated rownames?

Thanks -

when running Progeny one can call to two different functions:

  1. If the parameter perm = 1 (value by default), we run a function where the duplicated rownames (gene names) are not controlled. This is the case in the vignette. That's why I mentioned before that we should modify this function to control taht and prevent having duplicated gene names in the input matrix. Once this modification is done, we will need to do something in the vignette with the expression matrix to avoid having duplicated gene names (For instance, computing their average expression as mentioned before).
  2. If the parameter perm > 1 , we run a function where the duplicated rownames are controlled.

I hope this clarifies the issue.

Best,
Alberto.