niaid/dsb

Integrating RNA+DSB using Seurats v4 WNN

Closed this issue · 3 comments

Hi @MattPM ,

Great package, its working quiet well!

I was wondering on what you believe is the best way to integrate RNA with DSB normalised CITE values to use Seurats v4 latest WNN method.

Thanks!

Hi @danmoore1987,
I reviewed the WNN method from Seurat 4 more carefully but I haven't had time to test it yet. With the caveat that I have not tried it, dsb normalized values should work quite well directly with that method. In the Seurat tutorial, instead of using CLR normalization, you would just follow the dsb tutorial in the readme to set the data slot of the CITE assay to the dsb normalized protein data from dsb::DSBNormalizeProtein() then Seurat::RunPCA() followed by Seurat::FindMultiModalNeighbors().

The protein data in the WNN method is first reduced to principal components, so the values do not need to be on a CLR scale. In principle, using dsb to remove the ambient component of each protein and removing each cell's technical component should remove the noise from protein data prior to PCA which could theoretically improve the WNN results. In CLR space, each protein has a unique noise floor see supplementary fig 4A from the original CITE-seq paper where spike in control cells were used to define a threshold for each protein. dsb is basically doing that adjustment automatically by using empty drops to estimate the ambient component, then using isotype controls + fitting a Gaussian mixture to denoise each single cell, so dsb should make the protein data cleaner prior to the PCA used in the WNN.

You might start out using pretty high number of PCs for the protein PCA depending on your panel. With more modest sized protein panels of say 30 proteins like the example 10X PBMC datasets, running PCA on that already pretty low dimension protein space can add noise to protein-only clustering (since each protein tends to add a lot of information even in large panels) but I'm not sure yet if that would also extend to the combined mRNA + protein space. Best of luck, if you try it out pls post here, happy to provide further guidance.

Sorry for the late reply, I somehow missed it, thank you for looking into it and your advice!!!! Will report back my investigation.

Hi, circling back, please see updated documentation for using the WNN algorithm with dsb normalized protein values as input. Depending on the number of proteins in the CITE-seq dataset, it may be helpful to just use the normalized protein values, rather than use PCA on the normalized ADT. For example, with 10 proteins, compressing those 10 dimensions into 10 principal components could potentially just be noisier than using the normalized values themselves. For a dataset with 250 proteins, then PCA would be more reasonable. Of course all depends on the goals of the analysis. For the purposes of demonstrating code, both versions are highlighted (dsb values of protein vs PCA based on dsb values) in the updated readme for this package. This is a small example dataset but we have found using dsb with either version can improve results.