yanailab/knn-smoothing

understanding the output

Closed this issue · 2 comments

I am expecting that the smoothed matrix would contain less number of columns as the information across KNN is aggregated, so only one column per aggregated group of cells.

If I understand correctly, the output contains all the initial data but their values replaced by aggregated counts indicating the duplication of data. Is there a way to get the clumped output ?

Hi, this is not exactly how the algorithm works. The algorithm identifies neighbors for each cell, and then replaces the expression profile of that cell with the aggregated expression profiles from the cell and its neighbors. This is done for all cells, so that's why the output contains the same number of cells (columns) as the input.

If you would like to identify "clumps", I would recommend performing clustering on the smoothed data. For clustering, my recommendation would be to look at the first two PCs and apply DBSCAN, as described in our new preprint: https://www.biorxiv.org/content/early/2018/10/30/456129

Thanks for the explanation. I will check Moana as well.