ERROR: create_new_contigs failed.
Closed this issue · 2 comments
Sorry for already having another bug to report! I was trying to run Hairsplitter today after the new update (one with the multiploid command, one without, both using multithreading). Hairsplitter ran so much faster this time around, and none of the previously problematic steps seemed to have issues!
However, there seems to be a new issue with STAGE 6.
Running (this one was without multiploid)
#!/bin/bash
#SBATCH --time=08:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=28
#SBATCH --account=PAS1802
#SBATCH --job-name=1376_hairsplitter-25kb-keephap-2
#SBATCH --export=ALL
#SBATCH --output=1376_hairsplitter-25kb-keephap-2.out.%j
module load cmake/3.25.2
module load gnu/11.2.0
source /users/PAS1802/woodruff207/miniconda3/bin/activate
conda activate hairsplitter_env
cd /fs/ess/PAS1802/ALW/2023_06_15-MAY1376_TLOKOs_LongRead/1376/2_flye_assembly-keephap-25kb-2/
python /users/PAS1802/woodruff207/Hairsplitter/hairsplitter.py -f ../1_demul_adtrim/BC15-25kbmin.fastq -i assembly_graph.gfa -x ont -o ../8_hairsplitter -t 28
Resulted in
- Loading all reads from ../1_demul_adtrim/BC15-25kbmin.fastq in memory
- Loading all contigs from ../8_hairsplitter/tmp/cut_assembly.gfa in memory
- Loading alignments of the reads on the contigs from ../8_hairsplitter/tmp/reads_on_asm.sam
- Calling variants on each contig using basic pileup
separating reads on contig CONTIG edge_34@3 46849 117.353
separating reads on contig CONTIG edge_14@0 8141 131.349
separating reads on contig CONTIG edge_40@3 11455 100.049
separating reads on contig CONTIG edge_12@0 1480 444.816
separating reads on contig CONTIG edge_23@0 18874 219.034
separating reads on contig CONTIG edge_15@0 21915 217.477
separating reads on contig CONTIG edge_45@4 300000 223.291
separating reads on contig CONTIG edge_1@0 72803 76.6896
separating reads on contig CONTIG edge_36@0 10083 360.579
separating reads on contig CONTIG edge_37@1 64272 233.718
separating reads on contig CONTIG edge_46@0 300000 211.382
separating reads on contig CONTIG edge_28@6 87656 145.68
separating reads on contig CONTIG edge_39@0 115223 209.538
separating reads on contig CONTIG edge_41@1 233709 216.108
separating reads on contig CONTIG edge_16@1 93636 209.789
separating reads on contig CONTIG edge_45@0 300000 195.72
separating reads on contig CONTIG edge_28@5 300000 225.631
separating reads on contig CONTIG edge_44@4 300000 221.944
separating reads on contig CONTIG edge_45@2 300000 218.656
separating reads on contig CONTIG edge_37@0 300000 214.827
separating reads on contig CONTIG edge_44@1 300000 220.502
separating reads on contig CONTIG edge_44@0 300000 231.295
separating reads on contig CONTIG edge_28@4 300000 209.897
separating reads on contig CONTIG edge_6@1 300000 210.705
separating reads on contig CONTIG edge_48@0 300000 228.833
separating reads on contig CONTIG edge_45@1 300000 211.574
separating reads on contig CONTIG edge_22@0 2532 458.313
separating reads on contig CONTIG edge_40@1 300000 223.89
separating reads on contig CONTIG edge_44@3 300000 215.01
separating reads on contig CONTIG edge_6@0 300000 215.125
separating reads on contig CONTIG edge_34@1 300000 212.838
separating reads on contig CONTIG edge_48@1 300000 243.961
separating reads on contig CONTIG edge_33@0 10239 214.442
separating reads on contig CONTIG edge_38@0 300000 209.204
separating reads on contig CONTIG edge_44@2 300000 221.737
separating reads on contig CONTIG edge_34@2 300000 200.835
separating reads on contig CONTIG edge_35@0 300000 227.907
separating reads on contig CONTIG edge_34@0 300000 198.119
separating reads on contig CONTIG edge_45@5 299225 193.126
separating reads on contig CONTIG edge_3@0 25469 220.44
separating reads on contig CONTIG edge_44@6 300000 224.849
separating reads on contig CONTIG edge_28@2 300000 218.004
separating reads on contig CONTIG edge_44@9 99227 148.113
separating reads on contig CONTIG edge_4@0 16120 53.9382
separating reads on contig CONTIG edge_42@0 1252 12304.8
separating reads on contig CONTIG edge_7@2 233248 230.62
separating reads on contig CONTIG edge_44@5 300000 223.86
separating reads on contig CONTIG edge_28@3 300000 226.373
separating reads on contig CONTIG edge_47@0 262506 233.746
separating reads on contig CONTIG edge_40@0 300000 203.257
separating reads on contig CONTIG edge_45@3 300000 226.299
separating reads on contig CONTIG edge_35@1 300000 226.719
separating reads on contig CONTIG edge_41@0 300000 203.041
separating reads on contig CONTIG edge_28@1 300000 211.533
separating reads on contig CONTIG edge_40@2 300000 204.561
separating reads on contig CONTIG edge_44@7 300000 228.214
separating reads on contig CONTIG edge_32@0 159093 131.672
separating reads on contig CONTIG edge_7@1 300000 263.177
separating reads on contig CONTIG edge_44@8 300000 226.218
separating reads on contig CONTIG edge_28@0 300000 232.642
separating reads on contig CONTIG edge_16@0 300000 204.921
separating reads on contig CONTIG edge_38@1 97516 218.851
separating reads on contig CONTIG edge_6@2 260846 197.358
separating reads on contig CONTIG edge_48@3 85724 104.09
separating reads on contig CONTIG edge_48@2 300000 239.204
separating reads on contig CONTIG edge_46@1 80433 197.154
separating reads on contig CONTIG edge_7@0 300000 203.408
- Creating the .gaf file describing how the reads align on the new contigs
- Creating the new contigs
ERROR racon failed, while running racon -w 500 -e 1 -t 1 ../8_hairsplitter/tmp/reads_11.fasta ../8_hairsplitter/tmp/mapped_11.paf ../8_hairsplitter/tmp/unpolished_11.fasta > ../8_hairsplitter/tmp/polished_11.fasta 2>../8_hairsplitter/tmp/trash.txt
/users/PAS1802/woodruff207/Hairsplitter/hairsplitter.py -f ../1_demul_adtrim/BC15-25kbmin.fastq -i assembly_graph.gfa -x ont -o ../8_hairsplitter -t 28
HairSplitter v1.3.3 (github.com/RolandFaure/HairSplitter). Last update: 2023-08-21
******************
* *
* Hairsplitter *
* Welcome! *
* *
******************
===== STAGE 1: Cleaning graph of small contigs that are unconnected parts of haplotypes [ 2023-08-21 14:44:48.560662 ]
When the assemblers manage to locally phase the haplotypes, they sometimes assemble the alternative haplotype as a separate contig, unconnected in the gfa graph. This affects negatively the performance of Hairsplitter. Let's delete these contigs
- Mapping the assembly against itself
Running: /users/PAS1802/woodruff207/Hairsplitter/src/build/clean_graph assembly_graph.gfa ../8_hairsplitter/tmp/cleaned_assembly.gfa ../8_hairsplitter ../8_hairsplitter/hairsplitter.log 28 minimap2
- Eliminated small unconnected contigs that align on other contigs
===== STAGE 2: Aligning reads on the reference [ 2023-08-21 14:44:50.479979 ]
- Cutting the contigs in chunks of 300000bp to avoid memory issues
- Converting the assembly in fasta format
- Aligning the reads on the assembly
- Running minimap with command line:
minimap2 ../8_hairsplitter/tmp/cleaned_assembly.fasta ../1_demul_adtrim/BC15-25kbmin.fastq -x map-ont -a --secondary=no -t 28 > ../8_hairsplitter/tmp/reads_on_asm.sam 2> ../8_hairsplitter/tmp/logminimap.txt
The log of minimap2 can be found at ../8_hairsplitter/tmp/logminimap.txt
===== STAGE 3: Calling variants [ 2023-08-21 14:46:07.495951 ]
Running: /users/PAS1802/woodruff207/Hairsplitter/src/build/call_variants ../8_hairsplitter/tmp/cut_assembly.gfa ../1_demul_adtrim/BC15-25kbmin.fastq ../8_hairsplitter/tmp/reads_on_asm.sam 28 ../8_hairsplitter/tmp ../8_hairsplitter/tmp/error_rate.txt 0 ../8_hairsplitter/tmp/variants.col ../8_hairsplitter/tmp/variants.vcf
===== STAGE 4: Filtering variants [ 2023-08-21 14:51:21.968869 ]
- Filtering variants
Running: /users/PAS1802/woodruff207/Hairsplitter/src/build/filter_variants ../8_hairsplitter/tmp/variants.col 0.0121198 28 0 ../8_hairsplitter/tmp/filtered_variants.col ../8_hairsplitter/tmp/variants.vcf ../8_hairsplitter/tmp/variants_filtered.vcf
===== STAGE 5: Separating reads by haplotype of origin [ 2023-08-21 14:51:51.986796 ]
- Separating reads by haplotype of origin
Running: /users/PAS1802/woodruff207/Hairsplitter/src/build/separate_reads ../8_hairsplitter/tmp/filtered_variants.col 28 0.0121198 0 ../8_hairsplitter/tmp/reads_haplo.gro
===== STAGE 6: Creating all the new contigs [ 2023-08-21 16:22:00.649424 ]
This can take time, as we need to polish every new contig using Racon
Running : /users/PAS1802/woodruff207/Hairsplitter/src/build/create_new_contigs ../8_hairsplitter/tmp/cut_assembly.gfa ../1_demul_adtrim/BC15-25kbmin.fastq 0.0121198 ../8_hairsplitter/tmp/reads_haplo.gro ../8_hairsplitter/tmp 28 ont ../8_hairsplitter/tmp/zipped_assembly.gfa ../8_hairsplitter/tmp/reads_on_new_contig.gaf 0 minimap2 racon 0
ERROR: create_new_contigs failed. Was trying to run: /users/PAS1802/woodruff207/Hairsplitter/src/build/create_new_contigs ../8_hairsplitter/tmp/cut_assembly.gfa ../1_demul_adtrim/BC15-25kbmin.fastq 0.0121198 ../8_hairsplitter/tmp/reads_haplo.gro ../8_hairsplitter/tmp 28 ont ../8_hairsplitter/tmp/zipped_assembly.gfa ../8_hairsplitter/tmp/reads_on_new_contig.gaf 0 minimap2 racon 0
And running (this one was with multiploid)
#!/bin/bash
#SBATCH --time=08:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=28
#SBATCH --account=PAS1802
#SBATCH --job-name=1376_hairsplitter-25kb-keephap-2-multiploid
#SBATCH --export=ALL
#SBATCH --output=1376_hairsplitter-25kb-keephap-2-multiploid.out.%j
module load cmake/3.25.2
module load gnu/11.2.0
source /users/PAS1802/woodruff207/miniconda3/bin/activate
conda activate hairsplitter_env
cd /fs/ess/PAS1802/ALW/2023_06_15-MAY1376_TLOKOs_LongRead/1376/2_flye_assembly-keephap-25kb-2/
python /users/PAS1802/woodruff207/Hairsplitter/hairsplitter.py -f ../1_demul_adtrim/BC15-25kbmin.fastq -i assembly_graph.gfa -x ont -o ../8_hairsplitter-multiploid -m -t 28
Resulted in
- Loading all reads from ../1_demul_adtrim/BC15-25kbmin.fastq in memory
- Loading all contigs from ../8_hairsplitter-multiploid/tmp/cut_assembly.gfa in memory
- Loading alignments of the reads on the contigs from ../8_hairsplitter-multiploid/tmp/reads_on_asm.sam
- Calling variants on each contig using basic pileup
separating reads on contig CONTIG edge_22@0 2532 458.313
separating reads on contig CONTIG edge_15@0 21915 217.477
separating reads on contig CONTIG edge_36@0 10083 360.579
separating reads on contig CONTIG edge_1@0 72803 76.6896
separating reads on contig CONTIG edge_48@3 85724 104.09
separating reads on contig CONTIG edge_44@9 99227 148.113
separating reads on contig CONTIG edge_48@2 300000 239.204
separating reads on contig CONTIG edge_28@6 87656 145.68
separating reads on contig CONTIG edge_41@1 233709 216.108
separating reads on contig CONTIG edge_28@1 300000 211.533
separating reads on contig CONTIG edge_28@5 300000 225.631
separating reads on contig CONTIG edge_45@2 300000 218.656
separating reads on contig CONTIG edge_47@0 262506 233.746
separating reads on contig CONTIG edge_44@1 300000 220.502
separating reads on contig CONTIG edge_48@0 300000 228.833
separating reads on contig CONTIG edge_41@0 300000 203.041
separating reads on contig CONTIG edge_44@8 300000 226.218
separating reads on contig CONTIG edge_44@0 300000 231.295
separating reads on contig CONTIG edge_28@4 300000 209.897
separating reads on contig CONTIG edge_28@0 300000 232.642
separating reads on contig CONTIG edge_35@1 300000 226.719
separating reads on contig CONTIG edge_45@1 300000 211.574
separating reads on contig CONTIG edge_7@2 233248 230.62
separating reads on contig CONTIG edge_40@1 300000 223.89
separating reads on contig CONTIG edge_28@2 300000 218.004
separating reads on contig CONTIG edge_34@1 300000 212.838
separating reads on contig CONTIG edge_32@0 159093 131.672
separating reads on contig CONTIG edge_6@0 300000 215.125
separating reads on contig CONTIG edge_38@1 97516 218.851
separating reads on contig CONTIG edge_35@0 300000 227.907
separating reads on contig CONTIG edge_40@3 11455 100.049
separating reads on contig CONTIG edge_46@0 300000 211.382
separating reads on contig CONTIG edge_42@0 1252 12304.8
separating reads on contig CONTIG edge_45@5 299225 193.126
separating reads on contig CONTIG edge_44@3 300000 215.01
separating reads on contig CONTIG edge_45@4 300000 223.291
separating reads on contig CONTIG edge_34@3 46849 117.353
separating reads on contig CONTIG edge_40@2 300000 204.561
separating reads on contig CONTIG edge_40@0 300000 203.257
separating reads on contig CONTIG edge_34@0 300000 198.119
separating reads on contig CONTIG edge_37@1 64272 233.718
separating reads on contig CONTIG edge_45@0 300000 195.72
separating reads on contig CONTIG edge_14@0 8141 131.349
separating reads on contig CONTIG edge_28@3 300000 226.373
separating reads on contig CONTIG edge_23@0 18874 219.034
separating reads on contig CONTIG edge_44@6 300000 224.849
separating reads on contig CONTIG edge_33@0 10239 214.442
separating reads on contig CONTIG edge_6@2 260846 197.358
separating reads on contig CONTIG edge_12@0 1480 444.816
separating reads on contig CONTIG edge_37@0 300000 214.827
separating reads on contig CONTIG edge_16@1 93636 209.789
separating reads on contig CONTIG edge_6@1 300000 210.705
separating reads on contig CONTIG edge_46@1 80433 197.154
separating reads on contig CONTIG edge_4@0 16120 53.9382
separating reads on contig CONTIG edge_44@4 300000 221.944
separating reads on contig CONTIG edge_34@2 300000 200.835
separating reads on contig CONTIG edge_44@2 300000 221.737
separating reads on contig CONTIG edge_16@0 300000 204.921
separating reads on contig CONTIG edge_48@1 300000 243.961
separating reads on contig CONTIG edge_38@0 300000 209.204
separating reads on contig CONTIG edge_44@7 300000 228.214
separating reads on contig CONTIG edge_39@0 115223 209.538
separating reads on contig CONTIG edge_7@0 300000 203.408
separating reads on contig CONTIG edge_7@1 300000 263.177
separating reads on contig CONTIG edge_45@3 300000 226.299
separating reads on contig CONTIG edge_44@5 300000 223.86
separating reads on contig CONTIG edge_3@0 25469 220.44
- Creating the .gaf file describing how the reads align on the new contigs
- Creating the new contigs
ERROR racon failed, while running racon -w 500 -e 1 -t 1 ../8_hairsplitter-multiploid/tmp/reads_11.fasta ../8_hairsplitter-multiploid/tmp/mapped_11.paf ../8_hairsplitter-multiploid/tmp/unpolished_11.fasta > ../8_hairsplitter-multiploid/tmp/polished_11.fasta 2>../8_hairsplitter-multiploid/tmp/trash.txt
/users/PAS1802/woodruff207/Hairsplitter/hairsplitter.py -f ../1_demul_adtrim/BC15-25kbmin.fastq -i assembly_graph.gfa -x ont -o ../8_hairsplitter-multiploid -m -t 28
HairSplitter v1.3.3 (github.com/RolandFaure/HairSplitter). Last update: 2023-08-21
******************
* *
* Hairsplitter *
* Welcome! *
* *
******************
===== STAGE 1: Cleaning graph of small contigs that are unconnected parts of haplotypes [ 2023-08-21 14:44:56.992370 ]
When the assemblers manage to locally phase the haplotypes, they sometimes assemble the alternative haplotype as a separate contig, unconnected in the gfa graph. This affects negatively the performance of Hairsplitter. Let's delete these contigs
- Mapping the assembly against itself
Running: /users/PAS1802/woodruff207/Hairsplitter/src/build/clean_graph assembly_graph.gfa ../8_hairsplitter-multiploid/tmp/cleaned_assembly.gfa ../8_hairsplitter-multiploid ../8_hairsplitter-multiploid/hairsplitter.log 28 minimap2
- Eliminated small unconnected contigs that align on other contigs
===== STAGE 2: Aligning reads on the reference [ 2023-08-21 14:44:58.965748 ]
- Cutting the contigs in chunks of 300000bp to avoid memory issues
- Converting the assembly in fasta format
- Aligning the reads on the assembly
- Running minimap with command line:
minimap2 ../8_hairsplitter-multiploid/tmp/cleaned_assembly.fasta ../1_demul_adtrim/BC15-25kbmin.fastq -x map-ont -a --secondary=no -t 28 > ../8_hairsplitter-multiploid/tmp/reads_on_asm.sam 2> ../8_hairsplitter-multiploid/tmp/logminimap.txt
The log of minimap2 can be found at ../8_hairsplitter-multiploid/tmp/logminimap.txt
===== STAGE 3: Calling variants [ 2023-08-21 14:46:15.601267 ]
Running: /users/PAS1802/woodruff207/Hairsplitter/src/build/call_variants ../8_hairsplitter-multiploid/tmp/cut_assembly.gfa ../1_demul_adtrim/BC15-25kbmin.fastq ../8_hairsplitter-multiploid/tmp/reads_on_asm.sam 28 ../8_hairsplitter-multiploid/tmp ../8_hairsplitter-multiploid/tmp/error_rate.txt 0 ../8_hairsplitter-multiploid/tmp/variants.col ../8_hairsplitter-multiploid/tmp/variants.vcf
===== STAGE 4: Filtering variants [ 2023-08-21 14:51:39.988027 ]
- Filtering variants
Running: /users/PAS1802/woodruff207/Hairsplitter/src/build/filter_variants ../8_hairsplitter-multiploid/tmp/variants.col 0.0121198 28 0 ../8_hairsplitter-multiploid/tmp/filtered_variants.col ../8_hairsplitter-multiploid/tmp/variants.vcf ../8_hairsplitter-multiploid/tmp/variants_filtered.vcf
===== STAGE 5: Separating reads by haplotype of origin [ 2023-08-21 14:52:10.334915 ]
- Separating reads by haplotype of origin
Running: /users/PAS1802/woodruff207/Hairsplitter/src/build/separate_reads ../8_hairsplitter-multiploid/tmp/filtered_variants.col 28 0.0121198 0 ../8_hairsplitter-multiploid/tmp/reads_haplo.gro
===== STAGE 6: Creating all the new contigs [ 2023-08-21 16:14:06.974230 ]
This can take time, as we need to polish every new contig using Racon
Running : /users/PAS1802/woodruff207/Hairsplitter/src/build/create_new_contigs ../8_hairsplitter-multiploid/tmp/cut_assembly.gfa ../1_demul_adtrim/BC15-25kbmin.fastq 0.0121198 ../8_hairsplitter-multiploid/tmp/reads_haplo.gro ../8_hairsplitter-multiploid/tmp 28 ont ../8_hairsplitter-multiploid/tmp/zipped_assembly.gfa ../8_hairsplitter-multiploid/tmp/reads_on_new_contig.gaf 0 minimap2 racon 0
ERROR: create_new_contigs failed. Was trying to run: /users/PAS1802/woodruff207/Hairsplitter/src/build/create_new_contigs ../8_hairsplitter-multiploid/tmp/cut_assembly.gfa ../1_demul_adtrim/BC15-25kbmin.fastq 0.0121198 ../8_hairsplitter-multiploid/tmp/reads_haplo.gro ../8_hairsplitter-multiploid/tmp 28 ont ../8_hairsplitter-multiploid/tmp/zipped_assembly.gfa ../8_hairsplitter-multiploid/tmp/reads_on_new_contig.gaf 0 minimap2 racon 0
Both appear to be the same error, so I don't think it's the multiploid argument doing anything, but it's definitely not something I encountered previously, and this is the same dataset I ran last week, the only difference being multithreading (but I'm not certain it's multithreading at fault here). I did look at the commits and noticed that src/tools.cpp
was changed to allow an exit() during polishing, but given that it didn't have a problem during the Minimap2 step (which was also given an exit()), I don't know why racon would end up having an issue.
An oddity I just noticed during this - the variants_filtered.vcf
file never seems to have much added to it, even in my successful runs of Hairsplitter. All it seems to have is:
##fileformat=VCFv4.2
##source=call_variants
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
#CHROM POS ID REF ALT QUAL FILTER INFO
Lastly, I don't know if it's helpful, but it looks like Hairsplitter tried to make a /tmp/
directory and a trash.txt
file in my assembly folder (i.e. in the /fs/ess/PAS1802/ALW/2023_06_15-MAY1376_TLOKOs_LongRead/1376/2_flye_assembly-keephap-25kb-2/
directory). I don't know if it has done that every time and simply deleted it later, or if this is a new bug and it simply happened to leave the files there because Hairsplitter died before it could remove them. There are also a lot more files in the proper /8_hairsplitter/tmp/tmp/
directory than there were previously, like it wasn't deleting the files as it was running:
Hello,
The bug occured when an haplotype was very divergent from the assembly: HairSplitter tried to polish the assembly with the reads, but failed to map the reads on the assembly. I added a reassembly module for the cases where the reads are too divergent. This should also improve the phasing of the large insertions/deletions/inversions.
A new version has been pushed and released :-)
Wonderful - thank you so much! I'll give it another try again today (one with multiploid and one without again) and see how it goes.