niaid/dsb

Truncating negative values

erflynn opened this issue · 2 comments

Hi,

Really excited about this package and have been using it regularly.
One note is that some of the DSB values wind up low negative. In an old tutorial, you suggested:
Very negative values correspond to very low expression and it can be helpful to convert all dsb normalized values < -10 to 0 for interpretation / visualization purposes.

Is there a reason not to set all values below zero to zero?
Thanks!

Hi @erflynn it is not unreasonable to do that. You can interpret a value of -1 for a cell as, 1 sd lower than the empty droplet mean on the natural log scale. Most likely there is not biological information in the values below 0. If you want to preserve that scale across cells you could leave values as is. It may be tough to benchmark but you could see if it makes a difference if you cluster with / without the thresholded values at 0. I have no analysis to back this up but my intuition is that for many datasets it won't make much difference. When you cluster, protein information is compressed and there will be more relative information in the distances between highly positive values (like a protein with normalized expression of 20 on a cell) vs the smaller differences that are distribution that the dsb scale says are noise such as those between a small negative value and 0 in question.

thanks! this is helpful to consider -- and I may try clustering both ways, though I expect using the negative values in WNN could add too much weight to the ADT for these negative values? I'm getting values as negative as -15 occasionally.
I'm am also filtering to remove cells with the top 0.5% isotype control values pre-DSB to try to reduce the negative values.