niaid/dsb

Quantile trimming after dsb normalisation

Closed this issue · 2 comments

After looking at my normalised data in more detail, I found many markers that have really extreme range of values. Usually, it was caused by only a few data points. I tried tinkering with the thresholds to determine the ambient matrix, but I still see this problem. I thought of scaling, but then it will remove the info on the actual range of expression. In the end, I trimmed the .01 and 99.9 percentile and it looked much better!

image

Do you have any thoughts on this?

@MikhaelManurung In our dataset there are also some cells with very low negative values for a given protein after running the cell to cell technical component correction (e.g. less than 30 cells across our dataset with 56k cells, often low values like -30 for a protein) but in some datasets there are no such outliers. It has not impacted downstream analysis / clustering in our hands. Since it is dataset dependent and often such a very tiny fraction of the cells, I have not looked systematically at what drives the outlier values yet. Since in each dataset it has been such a tiny number of cells that have an outlier value, removing the cells with the quantile trimming as you are showing is reasonable. The benefits of removing per cell technical variation, i.e. denoise.counts = TRUE far outweigh a few outlier cells this can create overall.

Thanks for your elaborate response!