chris-mcginnis-ucsf/MULTI-seq

tutorial data barcode check result

Closed this issue · 3 comments

lb15 commented

Hey Chris,

I'm testing out deMULTIplex on the PBMC data, following the tutorial. I ran the following:


library(deMULTIplex)
library(ggplot2)

bar.ref <- read.csv("~/resources/deMULTIplex_ExampleData/LMOlist.csv",header=F)
cell.id.vec <- read.table("~/resources/deMULTIplex_ExampleData/cellIDs.txt")

barcs <- bar.ref$V1
cell.ids <- cell.id.vec$x

readTable <- MULTIseq.preProcess(R1 = '~/resources/deMULTIplex_ExampleData/ACAGTG_S3_L001_R1_001.fastq.gz', R2 = '~/resources/deMULTIplex_ExampleData/ACAGTG_S3_L001_R2_001.fastq.gz', cellIDs = cell.ids, cell=c(1,16), umi=c(17,28), tag=c(1,8))

str(readTable)

## Perform MULTI-seq sample barcode alignment
bar.table <- MULTIseq.align(readTable, cell.ids, barcs)

## Visualize barcode space
bar.tsne <- barTSNE(bar.table[,1:96]) 
## Note: Exclude columns 97:98 (assuming 96 barcodes were used) which provide total barcode UMI counts for each cell. 

for (i in 3:ncol(bar.tsne)) {
    g <- ggplot(bar.tsne, aes(x = TSNE1, y = TSNE2, color = bar.tsne[,i])) +
    geom_point() +
    scale_color_gradient(low = "black", high = "red") +
    ggtitle(colnames(bar.tsne)[i]) +
    theme(legend.position = "none") 
    print(g)
}

And checked the TSNE plots. I'm getting plots like these:
bar15
bar18

bar17

Is this the expected output for this dataset? A lot of the plots look like the "missing" example and I expected this dataset to not have missing barcodes based on your note.

Thanks!
Lauren

lb15 commented

rereading the dataset description, I realized that this dataset probably has 8 barcodes - meaning I should expect 8 barcodes to show clustered barcode expression in these plots, which I do see now. So I think that perhaps this is the expected result. I'm not entirely sure how to interpret the plots that show barcodes across the entire tSNE. does this occur for barcodes not included in the experimental run? thanks!

Hi @lb15 ,

This data actually looks super good, and it makes sense that 8 BCs were used because there are 8 visually-discernible clusters in barcode space. As described in the tutorial, I usually remove the BCs I didn't use in my experiment before running the tSNE to check whether all of the anticipated BCs form distinct clusters. This is the "point" of computing barcode space -- i.e., visually checking that all of your anticipated barcodes are "present" before running the sample classification pipeline.

Chris

lb15 commented

thanks Chris, that makes sense! i definitely misread the dataset description and originally thought there should be 96 barcodes...