no changes after scaffolding with arks-tigmint-links pipeline
Closed this issue ยท 9 comments
Hi,
I used Canu
to produce a draft genome and then ran the arks-tigmint-links pipeline with following commands.
TOOLDIR=/home/niuyw/software
PPN=20
# links
export PATH=/home/niuyw/software/links_v1.8.6:$PATH
# run2, canu
ln -s /home/zhangll/Tasks/Gouqi/10Xgenomic/longranger/CHROMIUM_interleaved.fq.gz
myReads=CHROMIUM_interleaved
ln -s /parastor300/niuyw/Project/Goqi_genome_180207/canu/run1/goqi.contigs.fasta canu.fa
myDraft=canu
$TOOLDIR/arks.1.0.3/Examples/arks-make arks-tigmint draft=${myDraft} reads=${myReads} threads=$PPN
I did not see any error messages in the logs, but the final genome was the same as the contigs input.
$ python assembly_stats.py canu.fa
Size_includeN 2830901440
Size_withoutN 2830901440
Seq_Num 16206
Mean_Size 174682
Median_Size 63595
Longest_Seq 21037189
Shortest_Seq 1001
GC_Content (%) 37.97
N50 599780
L50 871
N90 57129
Gap (%) 0.0
$ python assembly_stats.py canu.tigmint.renamed.fa
Size_includeN 2830901440
Size_withoutN 2830901440
Seq_Num 16206
Mean_Size 174682
Median_Size 63595
Longest_Seq 21037189
Shortest_Seq 1001
GC_Content (%) 37.97
N50 599780
L50 871
N90 57129
Gap (%) 0.0
$ python assembly_stats.py canu.tigmint_c5_m50-10000_k30_r0.05_e30000_z500_l5_a0.3.scaffolds.fa
Size_includeN 2830901440
Size_withoutN 2830901440
Seq_Num 16206
Mean_Size 174682
Median_Size 63595
Longest_Seq 21037189
Shortest_Seq 1001
GC_Content (%) 37.97
N50 599780
L50 871
N90 57129
Gap (%) 0.0
Here is the logs: arks_run2.txt
Do you have ideas about this?
Bests,
Yiwei Niu
Hi @YiweiNiu,
=>Preprocessing: Gathering barcode multiplicity information...Thu May 16 10:13:51 2019
Saw 0 barcodes and keeping 0 read pairs out of 0
Is the barcode in the form BX:Z:<barcode>
in the fastq read headers? Was tigmint able to successfully make some cuts in your assembly? This log makes me wonder if there is something off with where the chromium barcodes are in your reads file.
Is the output graph file (*original.gv) empty or does it have edges?
Lauren
Hi Lauren,
Thank you for your quick reply.
This should be the problem. I did not see BX:Z:<barcode>
in my fastq.
$ zcat CHROMIUM_interleaved.fq.gz |head -3
@ST-E00575:142:H7J7FCCXY:4:1101:18761:66584_AAACACCAGACAATAC
CGACTTTGTCCTATATCACAAGTGTGCTTTGAGTTGAATTTTGTGATTTCCCACAGATCATCAGGCTTAACAATTCCCCGAAGAAACCATCGACAGCCTTGAGTCTGTCGCTTACAAACCAGCCTCC
+
niuyw@admin:~/Project/Goqi_genome_180207/arks/run2$ zcat CHROMIUM_interleaved.fq.gz |head -4
@ST-E00575:142:H7J7FCCXY:4:1101:18761:66584_AAACACCAGACAATAC
CGACTTTGTCCTATATCACAAGTGTGCTTTGAGTTGAATTTTGTGATTTCCCACAGATCATCAGGCTTAACAATTCCCCGAAGAAACCATCGACAGCCTTGAGTCTGTCGCTTACAAACCAGCCTCC
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
And the *_original.gv
is empty.
$ cat canu.tigmint_c5_m50-10000_k30_r0.05_e30000_z500_original.gv
graph G {
}
I got this CHROMIUM_interleaved.fq.gz
by running the following commands.
/home/zhangll/software/longranger-2.2.0/longranger basic --id=fastq_Convert0319 --fastqs=/home/zhangll/Tasks/Gouqi/data/10X/Cleandata/NDHX00134-AK1521,/home/zhangll/Tasks/Gouqi/data/10X/Cleandata/NDHX00134-AK2007,/home/zhangll/Tasks/Gouqi/data/10X/Cleandata/NDHX00134-AK2008,/home/zhangll/Tasks/Gouqi/data/10X/Cleandata/NDHX00134-AK2009
gzip -c -k CHROMIUM_interleaved.fastq > CHROMIUM_interleaved.fq.gz
Do you know why the final CHROMIUM_interleaved.fq.gz
does not have any barcode infor in the headers?
Bests,
Yiwei Niu
Hi @YiweiNiu,
Have you (or anyone else in your group) used these reads previously for ARCS? It looks to me that the output file from longranger basic has been reformatted so that the barcode has been appended to the read header in the form @read-header_<barcode>
. For earlier versions of ARCS, the barcode was required in this format, but now it can see a barcode in the BX:Z:
tag of the BAM file. ARKS only looks for a barcode in the BX:Z:
comment of the fastq read header. That longranger command looks fine otherwise.
Since ARKS couldn't find the barcodes, that is why you got no scaffolding and the graph file (*_original.gv
) is empty.
To solve this, you could run longranger basic
again, or manually reformat your reads to put the barcode back in the BX:Z:
tag.
Hope that helps!
Lauren
Yes. Another guy in the lab used this CHROMIUM_interleaved.fastq
to run ARCS. I did not know whether this file had been reformatted. I would run longranger basic
again.
Thank you! Your input is really helpful.
Excellent - Glad I could help!
Hi Lauren,
Sorry to bother you again.
I ran longranger basic
again. But I still did not see BX:Z:<barcode>
in fastq headers.
The commands I used was the same as that of before.
/home/zhangll/software/longranger-2.2.0/longranger basic --id=fastq_Convert0628 --fastqs=/home/zhangll/Tasks/Gouqi/data/10X/Cleandata/NDHX00134-AK1521,/home/zhangll/Tasks/Gouqi/data/10X/Cleandata/NDHX00134-AK2007,/home/zhangll/Tasks/Gouqi/data/10X/Cleandata/NDHX00134-AK2008,/home/zhangll/Tasks/Gouqi/data/10X/Cleandata/NDHX00134-AK2009
The data is from one sample, and the fastqs I got are organized like this. The company gave us one Rawdata
and one Cleandata
.
โโโ Cleandata
โ โโโ NDHX00134-AK1521
โ โ โโโ NDHX00134-AK_S1_L004_R1_001.fastq.gz
โ โ โโโ NDHX00134-AK_S1_L004_R2_001.fastq.gz
โ โ โโโ NDHX00134-AK_S1_L005_R1_001.fastq.gz
โ โ โโโ NDHX00134-AK_S1_L005_R2_001.fastq.gz
โ โโโ NDHX00134-AK2007
โ โ โโโ NDHX00134-AK_S1_L004_R1_001.fastq.gz
โ โ โโโ NDHX00134-AK_S1_L004_R2_001.fastq.gz
โ โ โโโ NDHX00134-AK_S1_L005_R1_001.fastq.gz
โ โ โโโ NDHX00134-AK_S1_L005_R2_001.fastq.gz
โ โโโ NDHX00134-AK2008
โ โ โโโ NDHX00134-AK_S1_L004_R1_001.fastq.gz
โ โ โโโ NDHX00134-AK_S1_L004_R2_001.fastq.gz
โ โ โโโ NDHX00134-AK_S1_L005_R1_001.fastq.gz
โ โ โโโ NDHX00134-AK_S1_L005_R2_001.fastq.gz
โ โโโ NDHX00134-AK2009
โ โโโ NDHX00134-AK_S1_L004_R1_001.fastq.gz
โ โโโ NDHX00134-AK_S1_L004_R2_001.fastq.gz
โ โโโ NDHX00134-AK_S1_L005_R1_001.fastq.gz
โ โโโ NDHX00134-AK_S1_L005_R2_001.fastq.gz
โโโ Rawdata
โโโ NDHX00134-AK1521
โ โโโ NDHX00134-AK1521_L4_1.fq.gz
โ โโโ NDHX00134-AK1521_L4_2.fq.gz
โ โโโ NDHX00134-AK1521_L5_1.fq.gz
โ โโโ NDHX00134-AK1521_L5_2.fq.gz
โโโ NDHX00134-AK2007
โ โโโ NDHX00134-AK2007_L4_1.fq.gz
โ โโโ NDHX00134-AK2007_L4_2.fq.gz
โ โโโ NDHX00134-AK2007_L5_1.fq.gz
โ โโโ NDHX00134-AK2007_L5_2.fq.gz
โโโ NDHX00134-AK2008
โ โโโ NDHX00134-AK2008_L4_1.fq.gz
โ โโโ NDHX00134-AK2008_L4_2.fq.gz
โ โโโ NDHX00134-AK2008_L5_1.fq.gz
โ โโโ NDHX00134-AK2008_L5_2.fq.gz
โโโ NDHX00134-AK2009
โโโ NDHX00134-AK2009_L4_1.fq.gz
โโโ NDHX00134-AK2009_L4_2.fq.gz
โโโ NDHX00134-AK2009_L5_1.fq.gz
โโโ NDHX00134-AK2009_L5_2.fq.gz
The header of "Rawdata" and "Cleandata" fastqs look like this:
$ zcat Rawdata/NDHX00134-AK1521/NDHX00134-AK1521_L4_1.fq.gz | head -4
@ST-E00575:142:H7J7FCCXY:4:1101:7202:1872 1:N:0:CGCGATAC
CNATCTAGTGTTTAGCTTAGGGTTCCCCCATCTTCCCTTTATTTACATCCTGTTTTTAATAATATATCCTCCATGCACTAAGGAGTAGGGATGGAAACTTGATATAAGAAAATAAAAAATAAAAAAAAATTCCTAAACCACGTTTATATG
+
<#AFAJJJJJJJJJJJ7A-7AJAJ<<-A7<--<<7<7FJJJA-JFFJJJJJFFJ-<FFJJJJJJJJF--7<-7<<A--7FAJ7AA-AJ<J-7AJJ-A-A<F-7AFJJJ-77AA<FFJF-7-7F7--<A-7AAFFJFA--A-<)7-7-A-F
$ zcat Cleandata/NDHX00134-AK1521/NDHX00134-AK_S1_L004_R1_001.fastq.gz | head -4
@ST-E00575:142:H7J7FCCXY:4:1101:7202:1872 1:N:0:CGCGATAC
CNATCTAGTGTTTAGCTTAGGGTTCCCCCATCTTCCCTTTATTTACATCCTGTTTTTAATAATATATCCTCCATGCACTAAGGAGTAGGGATGGAAACTTGATATAAGAAAATAAAAAATAAAAAAAAATTCCTAAACCACGTTTATATG
+
<#AFAJJJJJJJJJJJ7A-7AJAJ<<-A7<--<<7<7FJJJA-JFFJJJJJFFJ-<FFJJJJJJJJF--7<-7<<A--7FAJ7AA-AJ<J-7AJJ-A-A<F-7AFJJJ-77AA<FFJF-7-7F7--<A-7AAFFJFA--A-<)7-7-A-F
After browsing this page, I guess there is something wrong with the names of the fastqs, but I do not know how to set them properly.
Do you have any ideas about this? Or I have to ask the company about how they preprocess the fastqs?
Bests,
Yiwei Niu
Hi @YiweiNiu,
What does your output from longranger basic
look like? Did you check more than just the head
for the BX:Z:
tag? I just ask because longranger basic
sorts the output file by barcode by default, so you will have to look further in the file to find the barcodes.
Are each of those different folders different libraries? Or from the same chromium library prep?
Lauren
Dear Lauren,
I checked the other part of the output file barcoded.fastq.gz
and found BX:Z
tags. So, I guess this file would be fine for arks
.
$ zcat fastq_Convert0319/outs/barcoded.fastq.gz | wc -l
5997361952
$ zcat fastq_Convert0319/outs/barcoded.fastq.gz | grep -c 'BX:Z'
1437330904
Previously I thought each record in the fastq
would have BX:Z
tag. Sorry for my silly question.
Thank you very much for your kindness and patience.
Best wishes,
Yiwei Niu