lbcb-sci/ra

How to add Illumina paired end raw reads?

Opened this issue · 11 comments

sivico26 commented on May 20:
The command line I used to run Ra was the following:
ra -t $threads -x ont $ont_reads reads/ngs.fastq.gz > assembly.fasta
where $threads was set to 24 and $ont_reads, was the path to ont-reads and ngs.fastq.gz are Illumina reads.

BUT it is not clear WHAT is ngs.fastq.gz file? Paired-end raw reads?

My question is how to use Illumina paired-end raw reads?

I am going to use Illumina paired-end raw reads (DRR_1.fastq DRR_2.fastq) to polish my scaffolds.
Is this command correct?
ra -t $threads -x ont ont_reads.fastq DRR_1.fastq DRR_2.fastq > assembly.fasta`

Hello Ural,
unfortunately, you have to join paired end reads into one file in which reads of a pair need to have unique identifiers up the first white space (you can use this script). If you already have an assembly generated with ra or any other assembler, you can directly use racon to polish using Illumina reads.

Best regards,
Robert

Dear Robert!
Thank you for your prompt response!
I used
ra -x ont -t ${threads} ${path_in}${file} > ${path_out}assembly.fasta
and in assembly.fasta I found 118 Ctgs,
with the longest Ctg22 LN:i:11,456,129.

Flye made 226 contigs, with the longest 13,549,627.

What do you think, can I try the -u parameter to obtain longer contigs? What are the expected disadvantages when I use -u parameter?
Are there any other parameters to tweak out the RA for longer contiguity?

Thank you in advance.

My input nanopore.fq file summary:
Mean read length: 4,324.2
Mean read quality: 11.1
Median read length: 1,376.0
Median read quality: 11.2
Number of reads: 4,527,921.0
Read length N50: 15,127.0
Total bases: 19,579,799,717.0
Number, percentage and megabases of reads above quality cutoffs

Q5: 4527920 (100.0%) 19579.8Mb
Q7: 4526480 (100.0%) 19578.9Mb
Q10: 3025210 (66.8%) 13795.9Mb
Q12: 1675711 (37.0%) 7762.8Mb
Q15: 159428 (3.5%) 434.7Mb

Best regards,
Ural

Hi Ural,
the final assembly when using parameter -u will include shorter contigs from the layout phase and unpolished contigs from the consensus phase. This will not increase contiguity. Unfortunately, there are no parameters to tweak at this point. You could try https://github.com/lbcb-sci/raven which employs a different heuristic for layout simplification.

Best regards,
Robert

Thank you!

Hi Robert,
Can I use hi-quality contigs assembled using Illumina to polish assembly generated with flye? 
#racon [options ...] <sequences> <overlaps> <target sequences>
Is the following command correct?
racon trinity.assembly.fa flye.assembly.fa 
What options do I need here? Do I  need overlaps here?Thank you

My data:

Input file  ONT.fq           Illunima.fq
Assembler   flye             trinity 
Output file flye.assembly.fa trinity.assembly.fa  
Total len   217,498,702      228,331,812
Fragments   226              10,707 contgs
FragtsN50   11,334,339       43,751 contgs
Largst fr   15,942,058       6,352,280 scf1
Scaffolds   4                2,431
Mean covr   76               150

 
Best regards,
Ural

Hi Ural,
using the Illumina contigs to polish the Fyle assembly will have no effect due to insufficient coverage. Either use a tool to scaffold your Illumina assembly with the Flye contigs or just use the Illumina reads to polish the Flye assembly.

Best regards,
Robert

Thank you!

Hi Robert!
Ra created an assembly (see below). But I can't understand the meaning of the table. Could you explain what does it mean?

head -5 assembly.fasta.ann
216474654 119 11
0 Ctg0 LN:i:4439257 RC:i:70914 XC:f:1.000000
0 4439257 0
0 Ctg1 LN:i:5154285 RC:i:94461 XC:f:1.000000
4439257 5154285 0

Questions:
216474654 is a total length, right?
119 is a number of contigs?
What is 11 ?
What is LN:i: ?
What is RC:i: ?
What is XC:f: ?
Is "0 4439257" and "4439257 5154285" means that Ctg0 followed by Ctg1?

Do you have a manual about this?
Do you have a soft for Ra output visualization?

Thank you for advance!

Best regards,
Ural


Appendix
head -20 assembly.fasta.ann
216474654 119 11
0 Ctg0 LN:i:4439257 RC:i:70914 XC:f:1.000000
0 4439257 0
0 Ctg1 LN:i:5154285 RC:i:94461 XC:f:1.000000
4439257 5154285 0
0 Ctg2 LN:i:3418661 RC:i:56107 XC:f:1.000000
9593542 3418661 0
0 Ctg3 LN:i:3294594 RC:i:55324 XC:f:1.000000
13012203 3294594 0
0 Ctg4 LN:i:5218321 RC:i:83654 XC:f:0.999904
16306797 5218321 0
0 Ctg5 LN:i:2063340 RC:i:34121 XC:f:1.000000
21525118 2063340 0
0 Ctg6 LN:i:7061448 RC:i:117965 XC:f:1.000000
23588458 7061448 0
0 Ctg7 LN:i:3174718 RC:i:56233 XC:f:1.000000
30649906 3174718 0
0 Ctg8 LN:i:9151508 RC:i:157155 XC:f:0.999945
33824624 9151508 0
0 Ctg9 LN:i:5677563 RC:i:94959 XC:f:0.999736
42976132 5677563 0

Hello Ural,
how did you get the file with the .ann extension? On the other hand, explanation of the three tags is bellow:

  • LN:i:<int> is the length of this sequence
  • RC:i:<int> is the number of sequences used to polish this sequence
  • XC:f:<float> is the percentage of 500bp windows that are polished in this sequence

Best regards,
Robert

~/Assemb/Ra$ ra -x ont -t 4 ${path_in}${file} > acerana/assembly.fasta

Then I have polised Ra output using a Pilon with Illumona reads: java -d64 -Xms1G -Xmx200G -jar ${pilon_jar} --genome ${assembly} --frags ${frags} --threads ${threads} --changes --fix all --output ${prefix} --outdir ${outdir} --debug 3>&1 1>&2 2>&3 > ${prefix}.log

~/Assemb/Ra/acerana$ ls -alh
-rw-r--r-- 1 crciv crciv 207M 11ì 2 18:18 assembly.fasta
-rw-r--r-- 1 crciv crciv 16 11ì 22 22:01 assembly.fasta.amb
-rw-r--r-- 1 crciv crciv 7.5K 11ì 22 22:01 assembly.fasta.ann
-rw-r--r-- 1 crciv crciv 207M 11ì 22 22:01 assembly.fasta.bwt
-rw-r--r-- 1 crciv crciv 4.5K 11ì 22 21:59 assembly.fasta.fai
-rw-r--r-- 1 crciv crciv 52M 11ì 22 22:01 assembly.fasta.pac
-rw-r--r-- 1 crciv crciv 104M 11ì 22 22:02 assembly.fasta.sa

It is still unclear:
What is 11 ?
Is "0 4439257" and "4439257 5154285" means that Ctg0 followed by Ctg1?

Thank you!

No idea what the other numbers are, you should check the Pilon documentation to see what is stored in the .ann file.