Uncertain about how to get BX tag from longranger
Closed this issue · 3 comments
Hi,
Thanks in advance for any help. I'm running arks 1.0.2
, but my question is in reference to documentation on generating the interleaved input using longranger
. I tried running longranger
(v 2.2.2
) using this command
longranger basic --id=ID --fastqs=/path/to/10x/fastqs
Where my fastq files are:
sample_S1_L003_I1_001.fastq.gz
sample_S1_L003_R1_001.fastq.gz
sample_S1_L003_R2_001.fastq.gz
And the output (barcoded.fastq.gz
) is missing a BX tag
e.g.,
zcat barcoded.fastq.gz | head
@A00127:62:H5YY7DSXX:3:1369:9200:5462
AAGAAAGAAAGAAAGGGGATTGGTTACCAGGAAGAATAGAGGAAAGAGGTGGAAAGAATGGGGAAAAGGCAGGAGGGAGGAAAGGAGGGGTGCGACACTTCTCAGAATACACATTTCTGCATAACTC
+
FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
The sequencing center ran this command to generate the fastq files
module load bcl2fastq2
module load longranger
longranger mkfastq --run=$RP \
--csv=samplesheet.csv \
--use-bases-mask=Y150n,I8nn,n*,Y150n \
--ignore-dual-index > longranger_fq.out 2>&1
Perhaps the lack of the BX tag is due to me using the processed reads, but it looks like longranger basic
should be run on the demultiplexed reads from mkfastq
.
I tried running calcBarcodeMultiplicities.pl
on barcoded.fastq.gz
, which produces an output that looks like this:
head read_multiplicities.csv
CGCTTCAGTACAGCAG-1,99144
CGGACTGCACCTCGTT-1,91144
CAGAATCCACGCATCG-1,81924
but when arks
is run like this:
arks -p full -f draft_assembly.fasta read_multiplicities.csv barcoded.fastq.gz
it dies with this error:
Reading user inputs...
Finished reading user inputs...entering runArks()...
Entered runArks()...
Running: arks 1.0.2
pid 14326
-p full
-f draft_assembly.fasta
-a
-q
-w
-i
-o 0
-c 5
-k 30
-g 1
-j 0.55
-l 0
-z 500
-b draft_assembly.fasta.scaffk-method_c5_k30_g1_j0.55_l0_d0_e30000_r0.05
Min index multiplicity: 50
Max index multiplicity: 10000
-d 0
-e 30000
-r 0.05
-t 1
-v 0
---We are using KMER method.---
=>Preprocessing: Gathering barcode multiplicity information...Tue Oct 16 09:55:25 2018
Could not open . --fatal.
I'm guessing that this error is simply tied to not correctly generating the barcoded.fastq.gz
file correctly. Do you have any suggestions? Thanks! Zack
Hi Zack,
Have you looked further into your barcoded.fastq.gz
? Because the output from longranger basic
is sorted by barcode, I often do find that the first few reads in that file don't have an associated barcode (Likely due to sequencing error, so the program is unable to assign a barcode on the whitelist).
For example, if I head into the reads file used for the NA12878 human tests in the ARKS paper:
[lcoombe@lcoombe01 outs]$ gunzip -c barcoded.fastq.gz |head -n 8
@E00247:267:HMVT3CCXX:1:2202:19329:3823
AAAGGAGGGAGGAAGGAAGGAGGGAAAGAAAGAGAAAGAAAGAAAAAGAAAGAGAGAGAGAGAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAAAGAAAAAGAGAGAAAG
+
KKKKKKKKKFKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKFKKKFKKKKKKKKKKKKKKKKKKFKKKKKKK7FKKKKAFKKKFKKKKKKKKAKAA7F,<F<<<7<A
@E00247:267:HMVT3CCXX:1:2202:19329:3823
CTGTGATCAATTAAGCAGCTGACCAGTCGTTACCCGCTCCTCCCTGCTCTTGCTACCCAATAAATACGAAGGGCTGTAGAAACTCAGGGTGGCTGCTGCCTTTGCTCACTAGAAGCAGGGAGCCCTTTTCTTCTTCCCCTGGCCCCTTCCT
+
AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKAKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKFFAKFKKKKKKKKKKAKKKFFFF7FFKKFK
But, there are associated barcodes when I look further into the file:
[lcoombe@lcoombe01 outs]$ gunzip -c barcoded.fastq.gz |grep -n "BX" |head -n 4
8641:@E00247:267:HMVT3CCXX:2:1207:19522:61714 BX:Z:AAACACCAGACAATAC-1
8645:@E00247:267:HMVT3CCXX:2:1207:19522:61714 BX:Z:AAACACCAGACAATAC-1
8649:@E00247:267:HMVT3CCXX:5:2110:13210:41638 BX:Z:AAACACCAGACAATAC-1
8653:@E00247:267:HMVT3CCXX:5:2110:13210:41638 BX:Z:AAACACCAGACAATAC-1
You are getting that error because you need to specify the read_multiplicities.csv
file that you generated with -a
:
arks -p full -f draft_assembly.fasta -a read_multiplicities.csv barcoded.fastq.gz
Hope that helps!
Lauren
@lcoombe Thanks for your prompt response. You are correct. The BX tags are there for some reads. I wasn't aware that they would only be there in a subset. The -a
flag seems to have done the trick! Thanks for spotting it! Okay to close. I'll open another issue if something else comes up.