niaid/dsb

multiplexing experiments section?

bbimber opened this issue · 1 comments

Hello - we're very interested in trying dsb to normalize multi-lane 10x ADT data. in your README you refer to a "multiplexing experiments section". Sorry if I missed this, but does that section exist somewhere? Thanks.

Hi @bbimber that section is in the vignette: https://github.com/niaid/dsb/blob/master/vignettes/dsb_normalizing_CITEseq_data.Rmd.

Also check out the FAQ section in the vignette if you added the same cell mixture across multiple lanes of 10x-you will want to use the raw output from cell ranger and append a unique lane ID to the barcodes. After that you can merge into a single SingleCellExperiment or Seurat object. From that single object you can estimate background across all the droplets using simple histogram cutoffs (see my reply to your other post).

A bit outside the scope of your questions but some general info / advice for multiplexing experiments (these are detailed more in a updated version of the paper not online yet)

A bit of advice when you get to the demultiplexing step with e.g. hashing data, you need a sufficient population of background drops for e.g. Seurat::HTODemux and other methods to work properly, so even when defining the background based on protein library size, you want to retain those background / negative droplets you defined with the histogram cutoffs and mRNA based QC when you get to the demultiplexing step. There can be a sort of 'sweet spot' with e.g. HTODemux, in terms of the total number of droplets you use to get it to run depending on the number of cells you have / expected recovery. We wound up using the top 35,000 barcodes for each lane ranked by total protein library size, did HTODemux, then used the protein matrix from the droplets HTODemux assigned "Negative" as the argument to empty_drop_matrix in dsb to normalize the cells HTODemux assigned as "Singlet" (with some additional QC on the droplets) very similar to what is shown in the multiplexing section of the vignette. We also tested just using the matrix of proteins from the 'protein library size' based cutoffs (from the top ~100k barcodes from each lane x 12 lanes) as the argument to empty_drop_matrix in dsb to define background, as shown in the main part of the vignette. The resulting dsb normalized values were identically distributed with either method of defining background. Hope that helps and happy to answer questions if that was not clear. Let us know how it works.