chris-mcginnis-ucsf/MULTI-seq

Barcode match

Closed this issue ยท 4 comments

Hi Chris,
first of all thank you for your package, I'm currently using MULTI-Seq for my master thesis and it has been an important tool for my analysis. Maybe is a dumb question but I started wondering how does deMULTIplex keep track of which barcode is which?
For example, I have 6 barcodes named "Bar2", "Bar3", "Bar5", "Bar6", "Bar7", "Bar9" and the sequences are read in this order, after MULTIseq.align the column names are "Bar1", "Bar2", "Bar3", "Bar4", "Bar5", "Bar6". Will be "Bar1" correspond to "Bar2" in my annotation? Or is "Bar1" the first barcode it reads in readTable, therefore the first found in the fastq file?
I couldn't completely understand it from the code and the tutorial.
Thanks a lot!
Alessia

Hi Alessia,
I am not Chris, of course, but your first assumption is correct. The bar# corresponds to the position in the reference which was passed to the function. It is not obvious from the code, but this loop (and one other one) adds the counts to bar.table

for (tag in 1:ncol(tag.dists)) {
    bar.table[cell,tag] <- length(which(tag.dists[,tag] <= 1))
}

we can see that it iterates through numeric values in the sequence 1:ncol(tag.dists) which suggests that the columns (bars) are going to correspond to index which they appear in tag.dists.

Tracing back to tag.dists <- stringdistmatrix(a=tags, b=ref) we can see that tag.dists is placed in whatever order stringdistmatrix() puts them.

From the strindist package documentation: "For stringdistmatrix: if both a and b are passed, a length(a)xlength(b) matrix." is returned. Since b = ref, then the columns of tag.dists are going to be the reference sequences that are passed to the function in the order which they are passed.

Hope that helps,
John

Hi John,
thanks a lot, now I see it!
By the way, interesting work on your implementation of the package. I'll give it a try since I've been having some issues with noisy data myself.
Have a good day,
Alessia

Absolutely! For small numbers of datasets, Chris's code supports manual thresholding, which should work for noisy data too. I am sure he is wildly busy with his post-doc right now, but he may have more to add.

The code I have up right now unfortunately functions mostly as a proof of concept to provide automation, so I am afraid it may be difficult to implement as it is. I am currently working on the next version, so if you think any of that is helpful for you, feel free to email me at jbassett@fredhutch.org or comment on my repo!

Yes, I've tried manual thresholding and read some older issues that were also very helpful. Anyway I'll see and email you eventually.
Thanks a lot again!