Pipeline for DAMASKER
Closed this issue · 2 comments
mictadlo commented
Dear Eugene,
I would like to use your tools which you described on your DAZZLERBLOG for a tetraploid plant genome assembly project.
Reading through your blog it appears to me that there are following 13 steps involved:
Step 1. # dextract -o pacbio/*bam
Step 2. # fasta2DB plant *pacbio-reads.fasta
Step 3. # DBdust plant
Step 4. # DBsplit -x1000 plant
Step 5. # HPC.daligner plant -T 8 | bash -v
Step 6. # rm plant.*.plant.*.las
Step 7. # LAmerge plant.las plant.[0-9].las
Step 8. # DASqv -c50 plant plant.las
Step 9. # HPC.TANmask plant
Step 10. # HPC.REPmask -g1 -c20 -mtan plant
Step 11. # HPC.REPmask -g10 -c15 -mtan -mrep1 plant
Step 12. # HPC.REPmask -g100 -c10 -mtan -mrep1 -mrep10 plant
Step 13. # HPC.daligner -mtan -mrep1 -mrep10 -mrep100 plant
Do you think the above commands are correct and whether the output from the plant DB could be used for Racon pipeline (https://github.com/isovic/racon)?
Thank you in advance
Michal
thegenemyers commented
Hi MIchal,
Apologies for not responding much sooner.
TANmask and REPmask should be run before running daligner, so I'm
not sure
what your are doing with Steps 5-8 which should all occur later.
Also, for repeat masking, we have found that calling it just once
with -g chosen
that the number of blocks represents 1X of the data is sufficient.
…-- Gene
On 12/13/17, 5:33 AM, Michał T. Lorenc wrote:
Dear Eugene,
I would like to use your tools which you described on your DAZZLERBLOG
for a tetraploid plant genome assembly project.
Reading through your blog it appears to me that there are following 13
steps involved:
|Step 1. # dextract -o pacbio/*bam
Step 2. # fasta2DB plant *pacbio-reads.fasta
Step 3. # DBdust plant
Step 4. # DBsplit -x1000 plant
Step 5. # HPC.daligner plant -T 8 | bash -v
Step 6. # rm plant.*.plant.*.las
Step 7. # LAmerge plant.las plant.[0-9].las
Step 8. # DASqv -c50 plant plant.las
Step 9. # HPC.TANmask plant
Step 10. # HPC.REPmask -g1 -c20 -mtan plant
Step 11. # HPC.REPmask -g10 -c15 -mtan -mrep1 plant
Step 12. # HPC.REPmask -g100 -c10 -mtan -mrep1 -mrep10 plant
Step 13. # HPC.daligner -mtan -mrep1 -mrep10 -mrep100 plant
|
Do you think the above commands are correct and whether the output
from the plant DB could be used for Racon pipeline
(https://github.com/isovic/racon)?
Thank you in advance
Michal
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AGkkNszdvTFCPqilZ8aURD5m3rn5eEMbks5s_1OKgaJpZM4Q__4m>.
mictadlo commented
Hi Gene,
Thank you. I change the pipeline to the following steps created by this script:
Creating database
source activate thegenemyers
find /work/waterhouse_team/All_RawData/Each_Cell_Raw/ -name "*.arrow" -type f > fasta2DB_input.fofn
sed -i.bak 's|.arrow|.fasta|g' fasta2DB_input.fofn
fasta2DB DB -ffasta2DB_input.fofn
DBsplit -x500 -s250 DB
DBdust DB
Catrack -v DB dust
HPC.TANmask
source activate thegenemyers
HPC.TANmask DB -mdust -T4 -fTANmask
sh HPC.parallel_pbs.sh TANmask.01.OVL #MEM:9GB; CPU time:00:06:25
sh HPC.parallel_pbs.sh TANmask.02.CHECK.OPT #MEM:0.3GB; CPU time:00:00:02
sh HPC.parallel_pbs.sh TANmask.03.MASK #MEM:0.4GB; CPU time:00:00:01
sh TANmask.04.RM
qsub catrackTAN_pbs.sh
# Catrack -v DB tan
# rm .DB.*.tan.*
PBS Job 2678948.pbs
CPU time : 00:00:01
Wall time : 00:00:11
Mem usage : 8164kb
HPC.REPmask
source activate thegenemyers
HPC.REPmask -g1 -c20 -mdust -mtan DB -T4 -fREPmask
sh HPC.parallel_pbs.sh REPmask.01.OVL #MEM:30GB; CPU time:02:00:01
sh HPC.parallel_pbs.sh REPmask.02.CHECK.OPT #MEM:1.4GB; CPU time:00:00:03
sh HPC.parallel_pbs.sh REPmask.03.MASK #MEM:0.01GB; CPU time:00:00:06
sh REPmask.04.RM
Catrack -v DB rep1
rm .DB.*.rep1.*
HPC.daligner
source activate thegenemyers
DBstats -b1 -mdust -mtan -mrep1 DB > DBstats.out
/work/waterhouse_team/apps/bin> python calc_cutoff.py --genome_size 1800000000 --coverage 38 --db_stats
/work/waterhouse_team/banana/assembly/DBstats.out
6973
HPC.daligner -mdust -mtan -mrep1 -H6973 -T4 -fdaligner DB
sh HPC.parallel_pbs.sh daligner.01.OVL
sh HPC.parallel_pbs.sh daligner.02.CHECK.OPT
sh HPC.parallel_pbs.sh daligner.03.MERGE
sh HPC.parallel_pbs.sh daligner.04.CHECK.OPT
sh HPC.parallel_pbs.sh daligner.05.RM.OPT
sh HPC.parallel_pbs.sh daligner.06.MERGE
sh HPC.parallel_pbs.sh daligner.07.CHECK.OPT
sh daligner.08.RM