gagneurlab/OUTRIDER

Validation data set not producing values concordant with manual

Opened this issue · 1 comments

J-Lye commented

I've used OUTRIDER in my Thesis but have been advised that the very slight variability between the values I achieve and the values achieved in the OUTRIDER manual is a serious concern and may invalidate my results.

No matter how many times I repeat or modify my approach the results are always the same, it's a tiny difference 1.2x10^12 in value for example. I get these slightly different results no matter if I use the simple example or download the Kremer dataset and run it with the full OUTRIDER code.

Results from the Manual

  geneID sampleID pValue padjust zScore l2fc rawcounts normcounts
1:00 ATAD3C MUC1360 2.82E-11 1.57E-07 5.27 1.87 948 246.26
2:00 NBPF15 MUC1351 8.10E-10 4.51E-06 5.75 0.77 7591 7050.72
3:00 MSTO1 MUC1367 4.46E-09 2.48E-05 -6.2 -0.81 761 729.7
4:00 HDAC1 MUC1350 1.54E-08 8.56E-05 -5.93 -0.79 2215 2113.06
5:00 DCAF6 MUC1374 6.93E-08 3.86E-04 -5.68 -0.61 2348 3084.41
6:00 NBPF16 MUC1351 2.61E-07 7.25E-04 4.82 0.67 4014 3834.4
meanCorrected theta aberrant AberrantBySample AberrantByGene padj_rank
1:00 84.16 16.66 TRUE 1 1 1
2:00 4417.1 109.8 TRUE 2 1 1
3:00 1238.19 151.57 TRUE 1 1 1
4:00 3521.37 134.57 TRUE 1 1 1
5:00 4603 197.14 TRUE 1 1 1
6:00 2564.52 105.73 TRUE 2 1 2

Results from my Rscript
I am using the exact script from the manual and would benefit from confirmation others / developers are also experiencing this and it's due to some optimisation or something?

  geneID sampleID pValue padjust zScore l2fc rawcounts normcounts
1:00 ATAD3C MUC1360 2.70E-11 1.50E-07 5.29 1.87 948 246.93
2:00 NBPF15 MUC1351 6.48E-10 3.60E-06 5.79 0.78 7591 7070.41
3:00 MSTO1 MUC1367 4.76E-09 2.65E-05 -6.19 -0.81 761 729.59
4:00 HDAC1 MUC1350 1.34E-08 7.44E-05 -5.95 -0.78 2215 2121.49
5:00 DCAF6 MUC1374 6.26E-08 3.48E-04 -5.7 -0.61 2348 3084.29
6:00 NBPF16 MUC1351 2.19E-07 6.10E-04 4.85 0.68 4014 3844.74
  meanCorrected theta aberrant AberrantBySample AberrantByGene padj_rank
1:00 86.15 16.61 TRUE 1 1 1
2:00 4500.21 109.83 TRUE 2 1 1
3:00 1216.01 150.84 TRUE 1 1 1
4:00 3529.56 137.72 TRUE 1 1 1
5:00 4600.94 198.54 TRUE 1 1 1
6:00 2603.5 105.75 TRUE 2 1 2

Dear @J-Lye,
thank you for reporting this difference. Under the hood, we use the CPU-optimized RcppArmadillo package (https://arma.sourceforge.net/, https://cran.r-project.org/web/packages/RcppArmadillo/index.html). As this is compiled using locally available CPU functionality, the OUTRIDER optimization can lead to minor rounding differences across different CPU architectures. But if you run it locally on the same CPU twice, the results should replicate as the code is deterministic but unfortunately not agnostic of the underlying hardware.

I hope this helped you understand your differences in the results.