RolandFaure/Hairsplitter

ERROR: create_new_contigs failed.

FrostFlow13 opened this issue · 2 comments

Sorry for already having another bug to report! I was trying to run Hairsplitter today after the new update (one with the multiploid command, one without, both using multithreading). Hairsplitter ran so much faster this time around, and none of the previously problematic steps seemed to have issues!

However, there seems to be a new issue with STAGE 6.


Running (this one was without multiploid)

#!/bin/bash
#SBATCH --time=08:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=28
#SBATCH --account=PAS1802
#SBATCH --job-name=1376_hairsplitter-25kb-keephap-2
#SBATCH --export=ALL
#SBATCH --output=1376_hairsplitter-25kb-keephap-2.out.%j
module load cmake/3.25.2
module load gnu/11.2.0
source /users/PAS1802/woodruff207/miniconda3/bin/activate
conda activate hairsplitter_env
cd /fs/ess/PAS1802/ALW/2023_06_15-MAY1376_TLOKOs_LongRead/1376/2_flye_assembly-keephap-25kb-2/
python /users/PAS1802/woodruff207/Hairsplitter/hairsplitter.py -f ../1_demul_adtrim/BC15-25kbmin.fastq -i assembly_graph.gfa -x ont -o ../8_hairsplitter -t 28

Resulted in

 - Loading all reads from ../1_demul_adtrim/BC15-25kbmin.fastq in memory
 - Loading all contigs from ../8_hairsplitter/tmp/cut_assembly.gfa in memory
 - Loading alignments of the reads on the contigs from ../8_hairsplitter/tmp/reads_on_asm.sam
 - Calling variants on each contig using basic pileup
separating reads on contig CONTIG	edge_34@3	46849	117.353
separating reads on contig CONTIG	edge_14@0	8141	131.349
separating reads on contig CONTIG	edge_40@3	11455	100.049
separating reads on contig CONTIG	edge_12@0	1480	444.816
separating reads on contig CONTIG	edge_23@0	18874	219.034
separating reads on contig CONTIG	edge_15@0	21915	217.477
separating reads on contig CONTIG	edge_45@4	300000	223.291
separating reads on contig CONTIG	edge_1@0	72803	76.6896
separating reads on contig CONTIG	edge_36@0	10083	360.579
separating reads on contig CONTIG	edge_37@1	64272	233.718
separating reads on contig CONTIG	edge_46@0	300000	211.382
separating reads on contig CONTIG	edge_28@6	87656	145.68
separating reads on contig CONTIG	edge_39@0	115223	209.538
separating reads on contig CONTIG	edge_41@1	233709	216.108
separating reads on contig CONTIG	edge_16@1	93636	209.789
separating reads on contig CONTIG	edge_45@0	300000	195.72
separating reads on contig CONTIG	edge_28@5	300000	225.631
separating reads on contig CONTIG	edge_44@4	300000	221.944
separating reads on contig CONTIG	edge_45@2	300000	218.656
separating reads on contig CONTIG	edge_37@0	300000	214.827
separating reads on contig CONTIG	edge_44@1	300000	220.502
separating reads on contig CONTIG	edge_44@0	300000	231.295
separating reads on contig CONTIG	edge_28@4	300000	209.897
separating reads on contig CONTIG	edge_6@1	300000	210.705
separating reads on contig CONTIG	edge_48@0	300000	228.833
separating reads on contig CONTIG	edge_45@1	300000	211.574
separating reads on contig CONTIG	edge_22@0	2532	458.313
separating reads on contig CONTIG	edge_40@1	300000	223.89
separating reads on contig CONTIG	edge_44@3	300000	215.01
separating reads on contig CONTIG	edge_6@0	300000	215.125
separating reads on contig CONTIG	edge_34@1	300000	212.838
separating reads on contig CONTIG	edge_48@1	300000	243.961
separating reads on contig CONTIG	edge_33@0	10239	214.442
separating reads on contig CONTIG	edge_38@0	300000	209.204
separating reads on contig CONTIG	edge_44@2	300000	221.737
separating reads on contig CONTIG	edge_34@2	300000	200.835
separating reads on contig CONTIG	edge_35@0	300000	227.907
separating reads on contig CONTIG	edge_34@0	300000	198.119
separating reads on contig CONTIG	edge_45@5	299225	193.126
separating reads on contig CONTIG	edge_3@0	25469	220.44
separating reads on contig CONTIG	edge_44@6	300000	224.849
separating reads on contig CONTIG	edge_28@2	300000	218.004
separating reads on contig CONTIG	edge_44@9	99227	148.113
separating reads on contig CONTIG	edge_4@0	16120	53.9382
separating reads on contig CONTIG	edge_42@0	1252	12304.8
separating reads on contig CONTIG	edge_7@2	233248	230.62
separating reads on contig CONTIG	edge_44@5	300000	223.86
separating reads on contig CONTIG	edge_28@3	300000	226.373
separating reads on contig CONTIG	edge_47@0	262506	233.746
separating reads on contig CONTIG	edge_40@0	300000	203.257
separating reads on contig CONTIG	edge_45@3	300000	226.299
separating reads on contig CONTIG	edge_35@1	300000	226.719
separating reads on contig CONTIG	edge_41@0	300000	203.041
separating reads on contig CONTIG	edge_28@1	300000	211.533
separating reads on contig CONTIG	edge_40@2	300000	204.561
separating reads on contig CONTIG	edge_44@7	300000	228.214
separating reads on contig CONTIG	edge_32@0	159093	131.672
separating reads on contig CONTIG	edge_7@1	300000	263.177
separating reads on contig CONTIG	edge_44@8	300000	226.218
separating reads on contig CONTIG	edge_28@0	300000	232.642
separating reads on contig CONTIG	edge_16@0	300000	204.921
separating reads on contig CONTIG	edge_38@1	97516	218.851
separating reads on contig CONTIG	edge_6@2	260846	197.358
separating reads on contig CONTIG	edge_48@3	85724	104.09
separating reads on contig CONTIG	edge_48@2	300000	239.204
separating reads on contig CONTIG	edge_46@1	80433	197.154
separating reads on contig CONTIG	edge_7@0	300000	203.408
 - Creating the .gaf file describing how the reads align on the new contigs
 - Creating the new contigs
ERROR racon failed, while running racon -w 500 -e 1 -t 1 ../8_hairsplitter/tmp/reads_11.fasta ../8_hairsplitter/tmp/mapped_11.paf ../8_hairsplitter/tmp/unpolished_11.fasta > ../8_hairsplitter/tmp/polished_11.fasta 2>../8_hairsplitter/tmp/trash.txt
/users/PAS1802/woodruff207/Hairsplitter/hairsplitter.py -f ../1_demul_adtrim/BC15-25kbmin.fastq -i assembly_graph.gfa -x ont -o ../8_hairsplitter -t 28
HairSplitter v1.3.3 (github.com/RolandFaure/HairSplitter). Last update: 2023-08-21

	******************
	*                *
	*  Hairsplitter  *
	*    Welcome!    *
	*                *
	******************


===== STAGE 1: Cleaning graph of small contigs that are unconnected parts of haplotypes   [ 2023-08-21 14:44:48.560662 ]


 When the assemblers manage to locally phase the haplotypes, they sometimes assemble the alternative haplotype as a separate contig, unconnected in the gfa graph. This affects negatively the performance of Hairsplitter. Let's delete these contigs

 - Mapping the assembly against itself
 Running:  /users/PAS1802/woodruff207/Hairsplitter/src/build/clean_graph assembly_graph.gfa ../8_hairsplitter/tmp/cleaned_assembly.gfa ../8_hairsplitter ../8_hairsplitter/hairsplitter.log 28 minimap2
 - Eliminated small unconnected contigs that align on other contigs

===== STAGE 2: Aligning reads on the reference   [ 2023-08-21 14:44:50.479979 ]

 - Cutting the contigs in chunks of 300000bp to avoid memory issues
 - Converting the assembly in fasta format
 - Aligning the reads on the assembly
 - Running minimap with command line:
      minimap2 ../8_hairsplitter/tmp/cleaned_assembly.fasta ../1_demul_adtrim/BC15-25kbmin.fastq -x map-ont -a --secondary=no -t 28 > ../8_hairsplitter/tmp/reads_on_asm.sam 2> ../8_hairsplitter/tmp/logminimap.txt 
   The log of minimap2 can be found at ../8_hairsplitter/tmp/logminimap.txt

===== STAGE 3: Calling variants   [ 2023-08-21 14:46:07.495951 ]

 Running:  /users/PAS1802/woodruff207/Hairsplitter/src/build/call_variants ../8_hairsplitter/tmp/cut_assembly.gfa ../1_demul_adtrim/BC15-25kbmin.fastq ../8_hairsplitter/tmp/reads_on_asm.sam 28 ../8_hairsplitter/tmp ../8_hairsplitter/tmp/error_rate.txt 0 ../8_hairsplitter/tmp/variants.col ../8_hairsplitter/tmp/variants.vcf

===== STAGE 4: Filtering variants   [ 2023-08-21 14:51:21.968869 ]

 - Filtering variants
 Running:  /users/PAS1802/woodruff207/Hairsplitter/src/build/filter_variants ../8_hairsplitter/tmp/variants.col 0.0121198 28 0 ../8_hairsplitter/tmp/filtered_variants.col ../8_hairsplitter/tmp/variants.vcf ../8_hairsplitter/tmp/variants_filtered.vcf

===== STAGE 5: Separating reads by haplotype of origin   [ 2023-08-21 14:51:51.986796 ]

 - Separating reads by haplotype of origin
 Running:  /users/PAS1802/woodruff207/Hairsplitter/src/build/separate_reads ../8_hairsplitter/tmp/filtered_variants.col 28 0.0121198 0 ../8_hairsplitter/tmp/reads_haplo.gro

===== STAGE 6: Creating all the new contigs   [ 2023-08-21 16:22:00.649424 ]

 This can take time, as we need to polish every new contig using Racon
 Running :  /users/PAS1802/woodruff207/Hairsplitter/src/build/create_new_contigs ../8_hairsplitter/tmp/cut_assembly.gfa ../1_demul_adtrim/BC15-25kbmin.fastq 0.0121198 ../8_hairsplitter/tmp/reads_haplo.gro ../8_hairsplitter/tmp 28 ont ../8_hairsplitter/tmp/zipped_assembly.gfa ../8_hairsplitter/tmp/reads_on_new_contig.gaf 0 minimap2 racon 0
ERROR: create_new_contigs failed. Was trying to run: /users/PAS1802/woodruff207/Hairsplitter/src/build/create_new_contigs ../8_hairsplitter/tmp/cut_assembly.gfa ../1_demul_adtrim/BC15-25kbmin.fastq 0.0121198 ../8_hairsplitter/tmp/reads_haplo.gro ../8_hairsplitter/tmp 28 ont ../8_hairsplitter/tmp/zipped_assembly.gfa ../8_hairsplitter/tmp/reads_on_new_contig.gaf 0 minimap2 racon 0

And running (this one was with multiploid)

#!/bin/bash
#SBATCH --time=08:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=28
#SBATCH --account=PAS1802
#SBATCH --job-name=1376_hairsplitter-25kb-keephap-2-multiploid
#SBATCH --export=ALL
#SBATCH --output=1376_hairsplitter-25kb-keephap-2-multiploid.out.%j
module load cmake/3.25.2
module load gnu/11.2.0
source /users/PAS1802/woodruff207/miniconda3/bin/activate
conda activate hairsplitter_env
cd /fs/ess/PAS1802/ALW/2023_06_15-MAY1376_TLOKOs_LongRead/1376/2_flye_assembly-keephap-25kb-2/
python /users/PAS1802/woodruff207/Hairsplitter/hairsplitter.py -f ../1_demul_adtrim/BC15-25kbmin.fastq -i assembly_graph.gfa -x ont -o ../8_hairsplitter-multiploid -m -t 28

Resulted in

 - Loading all reads from ../1_demul_adtrim/BC15-25kbmin.fastq in memory
 - Loading all contigs from ../8_hairsplitter-multiploid/tmp/cut_assembly.gfa in memory
 - Loading alignments of the reads on the contigs from ../8_hairsplitter-multiploid/tmp/reads_on_asm.sam
 - Calling variants on each contig using basic pileup
separating reads on contig CONTIG	edge_22@0	2532	458.313
separating reads on contig CONTIG	edge_15@0	21915	217.477
separating reads on contig CONTIG	edge_36@0	10083	360.579
separating reads on contig CONTIG	edge_1@0	72803	76.6896
separating reads on contig CONTIG	edge_48@3	85724	104.09
separating reads on contig CONTIG	edge_44@9	99227	148.113
separating reads on contig CONTIG	edge_48@2	300000	239.204
separating reads on contig CONTIG	edge_28@6	87656	145.68
separating reads on contig CONTIG	edge_41@1	233709	216.108
separating reads on contig CONTIG	edge_28@1	300000	211.533
separating reads on contig CONTIG	edge_28@5	300000	225.631
separating reads on contig CONTIG	edge_45@2	300000	218.656
separating reads on contig CONTIG	edge_47@0	262506	233.746
separating reads on contig CONTIG	edge_44@1	300000	220.502
separating reads on contig CONTIG	edge_48@0	300000	228.833
separating reads on contig CONTIG	edge_41@0	300000	203.041
separating reads on contig CONTIG	edge_44@8	300000	226.218
separating reads on contig CONTIG	edge_44@0	300000	231.295
separating reads on contig CONTIG	edge_28@4	300000	209.897
separating reads on contig CONTIG	edge_28@0	300000	232.642
separating reads on contig CONTIG	edge_35@1	300000	226.719
separating reads on contig CONTIG	edge_45@1	300000	211.574
separating reads on contig CONTIG	edge_7@2	233248	230.62
separating reads on contig CONTIG	edge_40@1	300000	223.89
separating reads on contig CONTIG	edge_28@2	300000	218.004
separating reads on contig CONTIG	edge_34@1	300000	212.838
separating reads on contig CONTIG	edge_32@0	159093	131.672
separating reads on contig CONTIG	edge_6@0	300000	215.125
separating reads on contig CONTIG	edge_38@1	97516	218.851
separating reads on contig CONTIG	edge_35@0	300000	227.907
separating reads on contig CONTIG	edge_40@3	11455	100.049
separating reads on contig CONTIG	edge_46@0	300000	211.382
separating reads on contig CONTIG	edge_42@0	1252	12304.8
separating reads on contig CONTIG	edge_45@5	299225	193.126
separating reads on contig CONTIG	edge_44@3	300000	215.01
separating reads on contig CONTIG	edge_45@4	300000	223.291
separating reads on contig CONTIG	edge_34@3	46849	117.353
separating reads on contig CONTIG	edge_40@2	300000	204.561
separating reads on contig CONTIG	edge_40@0	300000	203.257
separating reads on contig CONTIG	edge_34@0	300000	198.119
separating reads on contig CONTIG	edge_37@1	64272	233.718
separating reads on contig CONTIG	edge_45@0	300000	195.72
separating reads on contig CONTIG	edge_14@0	8141	131.349
separating reads on contig CONTIG	edge_28@3	300000	226.373
separating reads on contig CONTIG	edge_23@0	18874	219.034
separating reads on contig CONTIG	edge_44@6	300000	224.849
separating reads on contig CONTIG	edge_33@0	10239	214.442
separating reads on contig CONTIG	edge_6@2	260846	197.358
separating reads on contig CONTIG	edge_12@0	1480	444.816
separating reads on contig CONTIG	edge_37@0	300000	214.827
separating reads on contig CONTIG	edge_16@1	93636	209.789
separating reads on contig CONTIG	edge_6@1	300000	210.705
separating reads on contig CONTIG	edge_46@1	80433	197.154
separating reads on contig CONTIG	edge_4@0	16120	53.9382
separating reads on contig CONTIG	edge_44@4	300000	221.944
separating reads on contig CONTIG	edge_34@2	300000	200.835
separating reads on contig CONTIG	edge_44@2	300000	221.737
separating reads on contig CONTIG	edge_16@0	300000	204.921
separating reads on contig CONTIG	edge_48@1	300000	243.961
separating reads on contig CONTIG	edge_38@0	300000	209.204
separating reads on contig CONTIG	edge_44@7	300000	228.214
separating reads on contig CONTIG	edge_39@0	115223	209.538
separating reads on contig CONTIG	edge_7@0	300000	203.408
separating reads on contig CONTIG	edge_7@1	300000	263.177
separating reads on contig CONTIG	edge_45@3	300000	226.299
separating reads on contig CONTIG	edge_44@5	300000	223.86
separating reads on contig CONTIG	edge_3@0	25469	220.44
 - Creating the .gaf file describing how the reads align on the new contigs
 - Creating the new contigs
ERROR racon failed, while running racon -w 500 -e 1 -t 1 ../8_hairsplitter-multiploid/tmp/reads_11.fasta ../8_hairsplitter-multiploid/tmp/mapped_11.paf ../8_hairsplitter-multiploid/tmp/unpolished_11.fasta > ../8_hairsplitter-multiploid/tmp/polished_11.fasta 2>../8_hairsplitter-multiploid/tmp/trash.txt
/users/PAS1802/woodruff207/Hairsplitter/hairsplitter.py -f ../1_demul_adtrim/BC15-25kbmin.fastq -i assembly_graph.gfa -x ont -o ../8_hairsplitter-multiploid -m -t 28
HairSplitter v1.3.3 (github.com/RolandFaure/HairSplitter). Last update: 2023-08-21

	******************
	*                *
	*  Hairsplitter  *
	*    Welcome!    *
	*                *
	******************


===== STAGE 1: Cleaning graph of small contigs that are unconnected parts of haplotypes   [ 2023-08-21 14:44:56.992370 ]


 When the assemblers manage to locally phase the haplotypes, they sometimes assemble the alternative haplotype as a separate contig, unconnected in the gfa graph. This affects negatively the performance of Hairsplitter. Let's delete these contigs

 - Mapping the assembly against itself
 Running:  /users/PAS1802/woodruff207/Hairsplitter/src/build/clean_graph assembly_graph.gfa ../8_hairsplitter-multiploid/tmp/cleaned_assembly.gfa ../8_hairsplitter-multiploid ../8_hairsplitter-multiploid/hairsplitter.log 28 minimap2
 - Eliminated small unconnected contigs that align on other contigs

===== STAGE 2: Aligning reads on the reference   [ 2023-08-21 14:44:58.965748 ]

 - Cutting the contigs in chunks of 300000bp to avoid memory issues
 - Converting the assembly in fasta format
 - Aligning the reads on the assembly
 - Running minimap with command line:
      minimap2 ../8_hairsplitter-multiploid/tmp/cleaned_assembly.fasta ../1_demul_adtrim/BC15-25kbmin.fastq -x map-ont -a --secondary=no -t 28 > ../8_hairsplitter-multiploid/tmp/reads_on_asm.sam 2> ../8_hairsplitter-multiploid/tmp/logminimap.txt 
   The log of minimap2 can be found at ../8_hairsplitter-multiploid/tmp/logminimap.txt

===== STAGE 3: Calling variants   [ 2023-08-21 14:46:15.601267 ]

 Running:  /users/PAS1802/woodruff207/Hairsplitter/src/build/call_variants ../8_hairsplitter-multiploid/tmp/cut_assembly.gfa ../1_demul_adtrim/BC15-25kbmin.fastq ../8_hairsplitter-multiploid/tmp/reads_on_asm.sam 28 ../8_hairsplitter-multiploid/tmp ../8_hairsplitter-multiploid/tmp/error_rate.txt 0 ../8_hairsplitter-multiploid/tmp/variants.col ../8_hairsplitter-multiploid/tmp/variants.vcf

===== STAGE 4: Filtering variants   [ 2023-08-21 14:51:39.988027 ]

 - Filtering variants
 Running:  /users/PAS1802/woodruff207/Hairsplitter/src/build/filter_variants ../8_hairsplitter-multiploid/tmp/variants.col 0.0121198 28 0 ../8_hairsplitter-multiploid/tmp/filtered_variants.col ../8_hairsplitter-multiploid/tmp/variants.vcf ../8_hairsplitter-multiploid/tmp/variants_filtered.vcf

===== STAGE 5: Separating reads by haplotype of origin   [ 2023-08-21 14:52:10.334915 ]

 - Separating reads by haplotype of origin
 Running:  /users/PAS1802/woodruff207/Hairsplitter/src/build/separate_reads ../8_hairsplitter-multiploid/tmp/filtered_variants.col 28 0.0121198 0 ../8_hairsplitter-multiploid/tmp/reads_haplo.gro

===== STAGE 6: Creating all the new contigs   [ 2023-08-21 16:14:06.974230 ]

 This can take time, as we need to polish every new contig using Racon
 Running :  /users/PAS1802/woodruff207/Hairsplitter/src/build/create_new_contigs ../8_hairsplitter-multiploid/tmp/cut_assembly.gfa ../1_demul_adtrim/BC15-25kbmin.fastq 0.0121198 ../8_hairsplitter-multiploid/tmp/reads_haplo.gro ../8_hairsplitter-multiploid/tmp 28 ont ../8_hairsplitter-multiploid/tmp/zipped_assembly.gfa ../8_hairsplitter-multiploid/tmp/reads_on_new_contig.gaf 0 minimap2 racon 0
ERROR: create_new_contigs failed. Was trying to run: /users/PAS1802/woodruff207/Hairsplitter/src/build/create_new_contigs ../8_hairsplitter-multiploid/tmp/cut_assembly.gfa ../1_demul_adtrim/BC15-25kbmin.fastq 0.0121198 ../8_hairsplitter-multiploid/tmp/reads_haplo.gro ../8_hairsplitter-multiploid/tmp 28 ont ../8_hairsplitter-multiploid/tmp/zipped_assembly.gfa ../8_hairsplitter-multiploid/tmp/reads_on_new_contig.gaf 0 minimap2 racon 0

Both appear to be the same error, so I don't think it's the multiploid argument doing anything, but it's definitely not something I encountered previously, and this is the same dataset I ran last week, the only difference being multithreading (but I'm not certain it's multithreading at fault here). I did look at the commits and noticed that src/tools.cpp was changed to allow an exit() during polishing, but given that it didn't have a problem during the Minimap2 step (which was also given an exit()), I don't know why racon would end up having an issue.

An oddity I just noticed during this - the variants_filtered.vcf file never seems to have much added to it, even in my successful runs of Hairsplitter. All it seems to have is:

##fileformat=VCFv4.2
##source=call_variants
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO

Lastly, I don't know if it's helpful, but it looks like Hairsplitter tried to make a /tmp/ directory and a trash.txt file in my assembly folder (i.e. in the /fs/ess/PAS1802/ALW/2023_06_15-MAY1376_TLOKOs_LongRead/1376/2_flye_assembly-keephap-25kb-2/ directory). I don't know if it has done that every time and simply deleted it later, or if this is a new bug and it simply happened to leave the files there because Hairsplitter died before it could remove them. There are also a lot more files in the proper /8_hairsplitter/tmp/tmp/ directory than there were previously, like it wasn't deleting the files as it was running:
image

Hello,
The bug occured when an haplotype was very divergent from the assembly: HairSplitter tried to polish the assembly with the reads, but failed to map the reads on the assembly. I added a reassembly module for the cases where the reads are too divergent. This should also improve the phasing of the large insertions/deletions/inversions.
A new version has been pushed and released :-)

Wonderful - thank you so much! I'll give it another try again today (one with multiploid and one without again) and see how it goes.