how to demultiplex dual index on paried end reads

Question

how to demultiplex dual index on paried end reads

Closed this issue 8 years ago · 13 comments

index1-read1 --- read2-index2

sample1 index1-a index2-b
sample2 index1-a index2-c
sample3 index1-d index2-c

Answer 1 · 2017-02-06T16:47:12.000Z

I don't have any dual index data to test this on, but I believe for your barcode in the barcodes file you use something like:

sample_name  TAATGCGC-GTACTGAC

The second barcode is likely reverse complemented.

Answer 2 · 2017-02-07T13:51:09.000Z

Thank you @brwnj.
If the reads are in separate files, as seq_R1.fq and seq_R2.fq. how to set up set up the command?

BTW, I wonder what is the relationship between this repo and ea-utils? Is the fastq-multx in ea-utils up to date?

Answer 3 · 2017-02-07T17:20:06.000Z

I don't know the command for sure. Re:

I don't have any dual index data to test this on

The relationship is such that this code is directly from ea-utils with slightly different versioning. The only changes present are to typos in the help message.

Answer 4 · 2017-02-11T06:57:05.000Z

Hi @brwnj
This is some test data. Would you please show me the code?
Thank you very much.

barcode.txt
test.1.fq.gz
test.2.fq.gz

Answer 5 · 2017-02-13T18:17:47.000Z

Fix the barcodes as stated above:

awk 'BEGIN{FS=" ";OFS="\t"}!/^#/{print $1,$2"-"$3}' barcode.txt > fixed_barcodes.txt

Then:

fastq-multx -B fixed_barcodes.txt test.1.fq.gz test.2.fq.gz -o %_R1.fastq -o %_R2.fastq

The top bit of the output includes counts of:

Id	Count	File(s)
F111	36	F111_R1.fastq	F111_R2.fastq
F114	9	F114_R1.fastq	F114_R2.fastq
F121	10	F121_R1.fastq	F121_R2.fastq
F124	16	F124_R1.fastq	F124_R2.fastq
F131	14	F131_R1.fastq	F131_R2.fastq
F134	21	F134_R1.fastq	F134_R2.fastq
F141	31	F141_R1.fastq	F141_R2.fastq
F144	16	F144_R1.fastq	F144_R2.fastq

Answer 6 · 2017-02-14T04:15:47.000Z

the second barcode is not reverse complemented.

Answer 7 · 2017-02-14T05:13:16.000Z

There is a problem, but that's not it. fastq-multx is matching barcodes in the sequence line only and not the header. Using -H, which should use the header, causes a seg fault.

I would recommend trying out Brian Bushnell's demuxbyname.sh method outlined here: https://www.biostars.org/p/139395/.

Answer 8 · 2017-09-11T13:11:46.000Z

some note:

If the sequence orientation is undetermined, use this barcode list to demultiplex the file.

awk '!/^#/{print $1"\t"$2"-"$3"\n"$1"\t"$3"-"$2}' barcode.txt > fixed_barcodes.txt

Dual barcode should in the format as barcode1-barcode2.

Write barcode sequence is in the original orientation, and shouldn't reverse barcode2.

Answer 9 · 2017-09-24T08:56:49.000Z

@brwnj
the second read is not trimed..

Answer 10 · 2018-04-03T17:34:45.000Z

@brwnj

Any progress on this?

Answer 11 · 2018-04-04T17:10:04.000Z

Progress? Prove to me that these reads are dual-indexed.

You can clearly see the reads coming off the sequencer have the same index per sequence:

@HWI-D00523:240:HF3WGBCXX:1:1116:1699:4861 1:N:0:CCTCCT
@HWI-D00523:240:HF3WGBCXX:2:2212:6141:20342 1:N:0:CCGTGA
@HWI-D00523:240:HF3WGBCXX:1:2101:18265:67898 1:N:0:CCTCCT

@HWI-D00523:240:HF3WGBCXX:1:1116:1699:4861 2:N:0:CCTCCT
@HWI-D00523:240:HF3WGBCXX:2:2212:6141:20342 2:N:0:CCGTGA
@HWI-D00523:240:HF3WGBCXX:1:2101:18265:67898 2:N:0:CCTCCT

Answer 12 · 2018-04-04T18:17:44.000Z

@brwnj I mean the bug that barcode in read 2 is not trimmed.

Answer 13 · 2019-08-29T16:37:12.000Z

So let me see if I'm inferring correctly here from this issue thread... Dual barcodes in separate index files can be demuxed by concatenating the sequences in the 2 index files and then supply the barcodes in the barcode file as "ID\tBC1-BC2\n"?