Murali-group/Beeline

Nan value in PIDC results

WWXkenmo opened this issue · 2 comments

Hi Beeline team,
I am currently using your neat pipeline, while I have encountered a very wird typo in the rankedEdges.csv file of PIDC results. It seems that on my datasets, the edge weights measured by PIDC is all nan values, like this
image

But! after I used a search algorithm developed by myself ( this algorithm need to repeat run PIDC, which could not be applied on the large-scale scRNA-seq datasets), I found that just delete some of the genes (in my cases, the 441th,865th,866th genes), the edge weights are back to normal ??
image

I originally thought that may be these genes have some bad statistical characteristics, but regretly that I didn't find any special properties of those genes. (e.g. average expression, variance, coefficients of variation, etc...)
I found this thing is happened in most of my datasets, so I think its really important to be figured out, but I have no idea about how to solve it.

In order to let your team to check this typo, I have create a repo and upload the ExpressionData.csv, https://github.com/WWXkenmo/PIDC_bug

Best,
Ken

Thank you for using BEELINE. I was able to reproduce the NaN error in the PIDC output using your example ExpressionData.txt. In the PIDC output I see the following error message for NaN edges:

Gamma distribution failed for Rps3 and Srgn; used normal instead.

I haven't root caused the error and will continue looking into this.

I haven't found any issues in the way that BEELINE prepares the input or parses the output from PIDC. I believe the error is related to a poor fit of the input to gamma or normal distributions, but I haven't identified how this results in NaN values in the output from PIDC. I recommend following up with the maintainers of PIDC at https://github.com/Tchanders/NetworkInference.jl for further root causing.