MaayanLab/appyter-catalog

DESEQ2 Issue

astern731 opened this issue · 8 comments

There is an issue with parsing columns and rows when using DESEQ2 as the DEG software. I've tried it using limma and it seems to work fine.
Error: Cell execution error on cell 12

RRuntimeError Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 signatures = get_signatures(classes, dataset, normalization, diff_gex_method, meta_class_column_name, filter_genes)
3 for label, signature in signatures.items():
4 case_label = label.split(" vs. ")[1]

File ~/utils.py:435, in get_signatures(classes, dataset, normalization, method, meta_class_column_name, filter_genes)
432 elif method == "DESeq2":
433 # deseq2 receives raw counts
434 DESeq2 = robjects.r['deseq2']
--> 435 DESeq2_results = pandas2ri.conversion.rpy2py(DESeq2(pandas2ri.conversion.py2rpy(expr_df), pandas2ri.conversion.py2rpy(cls1_sample_ids), pandas2ri.conversion.py2rpy(cls2_sample_ids)))
437 signature = pd.DataFrame(DESeq2_results[0])
438 signature.index = DESeq2_results[1]

File /usr/local/lib/python3.8/dist-packages/rpy2/robjects/functions.py:198, in SignatureTranslatedFunction.call(self, *args, **kwargs)
196 v = kwargs.pop(k)
197 kwargs[r_k] = v
--> 198 return (super(SignatureTranslatedFunction, self)
199 .call(*args, **kwargs))

File /usr/local/lib/python3.8/dist-packages/rpy2/robjects/functions.py:125, in Function.call(self, *args, **kwargs)
123 else:
124 new_kwargs[k] = conversion.py2rpy(v)
--> 125 res = super(Function, self).call(*new_args, **new_kwargs)
126 res = conversion.rpy2py(res)
127 return res

File /usr/local/lib/python3.8/dist-packages/rpy2/rinterface_lib/conversion.py:45, in cdata_res_to_rinterface..(*args, **kwargs)
44 def _(*args, **kwargs):
---> 45 cdata = function(*args, **kwargs)
46 # TODO: test cdata is of the expected CType
47 return _cdata_to_rinterface(cdata)

File /usr/local/lib/python3.8/dist-packages/rpy2/rinterface.py:680, in SexpClosure.call(self, *args, **kwargs)
673 res = rmemory.protect(
674 openrlib.rlib.R_tryEval(
675 call_r,
676 call_context.sexp._cdata,
677 error_occured)
678 )
679 if error_occured[0]:
--> 680 raise embedded.RRuntimeError(_rinterface._geterrmessage())
681 return res

RRuntimeError: Error in DESeqDataSetFromMatrix(countData = rawcount_dataframe, colData = colData, :
ncol(countData) == nrow(colData) is not TRUE
WT TG
R[write to console]: Error in DESeqDataSetFromMatrix(countData = rawcount_dataframe, colData = colData, :
ncol(countData) == nrow(colData) is not TRUE


I think this is a dataframe issue, where the column of gene names increases ncol by 1. I ran DESEQ2 locally in R studio and I am able to create a DESEQ2 object from the same data set I'm using in the Appyter with this line of code after importing the csv
txi <- data.frame(txi[,-1], row.names=txi[,1]), this ensures that ncol(countData) == nrow(colData)
cx-expression.csv
cx-metadata.csv

Hi @astern731 can you please let us know which Appyter you are using? It would be great if you can share the URL of the Appyter instance with the error... Many thanks for letting us know about this issue!

Hi @AviMaayan I was using the bulk RNA seq Appyter. I tried using a different data set, but now it seems that the notebook is no longer working. I've attached the csv files for this run.
conditions_h.csv
Hippo_cnts.csv

https://appyters.maayanlab.cloud/Bulk_RNA_seq/6d319a94fe5b630e258ce68db7ed4f852e051077/

Yes. It looks like things are not working the way they are suppose to be working... We'll look into it.

The problem might be that at the bottom of your file there are genes with no measured values. Please try removing them. Also, it looks like there are not many differentially expressed genes between the groups. I was able to upload your data to our tool BioJupies: https://maayanlab.cloud/biojupies/notebook/otmIpU7wa

Here is the fixed file:
Hippo_cnts.csv

@astern731 please retry the Appyters site with your data. It was fixed by @u8sand. The system ran out of memory and needed to be upgraded.

Hi @AviMaayan , @u8sand thanks for the assistance. Using the provided fixed csv file above, I was able to run the analysis through limma (
https://appyters.maayanlab.cloud/Bulk_RNA_seq/98502905c189f205720656d251f8ac1d4fe04363/) however the same data set causes the following error when using DESEQ2
RRuntimeError: Error in DESeqDataSetFromMatrix(countData = rawcount_dataframe, colData = colData, :
ncol(countData) == nrow(colData) is not TRUE

https://appyters.maayanlab.cloud/Bulk_RNA_seq/67dcd82e4c6df239f07a8aa7792509c1c87924ca/

I'm not sure if the issue is due to how row names are treated in the Appyter.
When using Rstudio I was able to generate a DESEQ2 object by importing the counts in Rstudio as counts <- data.frame(counts, row.names=geneid).

u8sand commented

I've confirmed this issue, thanks for letting us know. It would only happen when there are more than two groups. It affects edgeR and deseq2. I have a fix which will be published soon.