NextPolish is used to fix base errors (SNV/Indel) in the genome generated by noisy long reads, it can be used with short read data only or long read data only or a combination of both. It contains two core modules, and use a stepwise fashion to correct the error bases in reference genome. To correct the raw third-generation sequencing (TGS) long reads with approximately 15-10% sequencing errors, please use NextDenovo.
-
DOWNLOAD
click here or use the following command:
wget https://github.com/Nextomics/NextPolish/releases/download/v1.2.4/NextPolish.tgz
-
REQUIREMENT
-
INSTALL
tar -vxzf NextPolish.tgz && cd NextPolish && make
-
UNINSTALL
cd NextPolish && make clean
-
TEST
nextPolish test_data/run.cfg
-
QUICK RUN
- Prepare sgs_fofn
ls reads1_R1.fq reads1_R2.fq reads2_R1.fq reads2_R2.fq > sgs.fofn
- Create run.cfg
genome=input.genome.fa
echo -e "task = 1212\ngenome = $genome\nsgs_fofn = sgs.fofn" > run.cfg
- Run
nextPolish run.cfg
- Finally polished genome
/path_to_work_directory/genome.nextpolish.fasta
- Prepare sgs_fofn
Optional: You can also use your own alignment pipeline and then use NextPolish to polish the genome, which will faster than the default NextPolish pipeline when runing on a local system, see here for an example (using bwa to do alignment).
Note: It is recommend to use long reads to polish the raw genome (set
task
start with "5" andlgs_fofn
or use racon) before using short reads to avoid incorrect mapping of short reads in some high error rate regions, especially for the assembly generated without a consensus step, such as miniasm.
-
USAGE
Please see doc/OPTION.md for options introduction. -
PERFORMANCE COMPARISON
-
HELP
Please raise an issue at the issue page. -
CONTACT INFORMATION
For additional help, please send an email to huj_at_grandomics_dot_com. -
COPYRIGHT
NextPolish is freely available for academic use and other non-commercial use. -
CITE
Hu, Jiang, et al. "NextPolish: a fast and efficient genome polishing tool for long read assembly." Bioinformatics (Oxford, England) (2019). -
PLEASE STAR AND THANKS
-
FAQ
- What is the difference between NextPolish and Pilon?
Currently, NextPolish is focuses on genome correction using shotgun reads, which is also one of the most important steps (typically the last step) to accomplish a genome assembly, while Pilon can be used to make other improvements. For genome correction, NextPolish consumes considerable less time and has a higher correction accuracy for genomes with same sizes and such an advantage becomes more and more significant when the genome size of targeted assemblies increased compared to Pilon. See PERFORMANCE COMPARISION section for more details. - Which job scheduling systems are supported by NextPolish?
NextPolish use DRMAA to submit, control, and monitor jobs, so in theory, support all DRMAA-compliant system, such as LOCAL, SGE, PBS, SLURM. - How to continue running unfinished tasks?
No need to make any changes, simply run the same command again. - How to set the
task
parameter?
Thetask
parameter is used to set the polishing algorithm logic, 1, 2, 3, 4 are different algorithm modules for short reads, while 5 is the algorithm module for long reads. BTW, steps 3 and 4 are experimental, and we do not currently recommend running on a actual project. Settask=551212
means NextPolish will cyclically run steps 5, 1 and 2 with 2 iterations. - How many iterations to run NextPolish cyclically to get the best result?
Our test shown that run NextPolish with 2 iterations, and most of the bases with effectively covered by SGS data can be corrected. Please settask=best
to get the best result. Settask = best
means NextPolish will cyclically run steps [5], 1 and 2 with 2 iterations. Of course, you can require NextPolish to run with more iterations to get a better result, such as settask=555512121212
, which means NextPolish will cyclically run steps 5, 1 and 2 with 4 iterations. - Why does the contig N50 of polished genome become shorter or why does the polished genome contains some extra
N
?
In some cases, if the short reads containN
, some error bases will be fixed byN
(the global score of a kmer withN
is the largest and be selected), and removeN
in short reads will avoid this. - What is the difference between bwa or minimap2 to do SGS data mapping?
Our test shown Minimap2 is about 3 times faster than bwa, but the accuracy of polished genomes using minimap2 or bwa is tricky, depending on the error rate of genomes and SGS data, see here for more details. - How to specify the queue cpu/memory/bash to submit jobs?
Please use cluster_options, NextPolish will replace {vf}, {cpu}, {bash} with specific values needed for each jobs. - RuntimeError: Could not find drmaa library. Please specify its full path using the environment variable DRMAA_LIBRARY_PATH.
Please setup the environment variable: DRMAA_LIBRARY_PATH, see here for more details. - OSError: /path/lib64/libc.so.6: version `GLIBC_2.14' not found (required by /path/NextPolish/lib/calgs.so).
Please download this version and try again.
- What is the difference between NextPolish and Pilon?