Nan value in PIDC results
WWXkenmo opened this issue · 2 comments
Hi Beeline team,
I am currently using your neat pipeline, while I have encountered a very wird typo in the rankedEdges.csv file of PIDC results. It seems that on my datasets, the edge weights measured by PIDC is all nan values, like this
But! after I used a search algorithm developed by myself ( this algorithm need to repeat run PIDC, which could not be applied on the large-scale scRNA-seq datasets), I found that just delete some of the genes (in my cases, the 441th,865th,866th genes), the edge weights are back to normal ??
I originally thought that may be these genes have some bad statistical characteristics, but regretly that I didn't find any special properties of those genes. (e.g. average expression, variance, coefficients of variation, etc...)
I found this thing is happened in most of my datasets, so I think its really important to be figured out, but I have no idea about how to solve it.
In order to let your team to check this typo, I have create a repo and upload the ExpressionData.csv, https://github.com/WWXkenmo/PIDC_bug
Best,
Ken
Thank you for using BEELINE. I was able to reproduce the NaN error in the PIDC output using your example ExpressionData.txt. In the PIDC output I see the following error message for NaN edges:
Gamma distribution failed for Rps3 and Srgn; used normal instead.
I haven't root caused the error and will continue looking into this.
I haven't found any issues in the way that BEELINE prepares the input or parses the output from PIDC. I believe the error is related to a poor fit of the input to gamma or normal distributions, but I haven't identified how this results in NaN values in the output from PIDC. I recommend following up with the maintainers of PIDC at https://github.com/Tchanders/NetworkInference.jl for further root causing.