phasegenomics/FALCON-Phase

{sample}.phased.txt empty

aboyher opened this issue · 6 comments

Not sure what to make of this. Ran the pipeline, everything looked good, then it failed at the emit-haplotigs step. I traced that back to an empty {sample}.phased.txt file that was created by the phase rule. I tried running the command manually falcon-phase phase -f mince/salsa_phased.minced.fasta -b salsa_phased.binmat -m GATC -i salsa_phased.ov_index.txt -p tmp -s 10000000 and indeed it comes up with nothing. It seems to run normally, output looks like this:

INFO: RUNNING: /home/aboyher/src/FALCON-Phase/bin/falcon-phase -f mince/salsa_phased.minced.fasta -b salsa_phased.binmat -m GATC -i salsa_phased.ov_index.txt -p tmp -s 10000000 phase
INFO: parsing index file salsa_phased.ov_index.txt
INFO: loading sequence information
INFO: added motif: GATC
INFO: loaded sequence information
WARNING: seq: 004631F:0-23824 and seq: 004631F:0-23824 have zero cutsites, setting hi-c counts to 0
INFO: working on group 000001F
INFO: working on group 000002F
INFO: working on group 000003F
INFO: working on group 000004F
INFO: working on group 000005F
INFO: working on group 000006F
INFO: working on group 000007F
INFO: working on group 000008F

Any idea what the cause is? Should i use a higher number of iterations?

zeeev commented

Hi @aboyher that is odd. Can you head all the tmp* files and upload a few lines?

==> tmp.results.txt <==
000001F 000001F:96029-101765 000001F_001:0-96233 1.000000 15.1667 6.5520 164 0
000001F 000001F:1388649-1394711 000001F_007:0-193102 0.999839 1.5909 21.3506 165 1
000002F 000002F:348778-356827 000002F_001:0-83357 1.000000 31.8085 1.8969 166 2
000003F 000003F_006:0-193077 000003F:1511618-1517823 1.000000 16.5687 58.4000 3 167
000004F 000004F_005:0-315603 000004F:1636357-1641397 1.000000 4.9961 26.1905 4 168
000005F 000005F_004:0-794955 000005F:25325-35520 1.000000 20.3268 0.4359 5 169
000005F 000005F_007:0-493655 000005F:1845036-1852700 0.999875 29.8720 0.0513 6 170
000006F 000006F_004:0-1072880 000006F:1506562-1513566 1.000000 13.4233 27.3810 7 171
000007F 000007F_002:0-519483 000007F:528412-536003 1.000000 8.7955 6.2500 8 172
000007F 000007F_004:0-119677 000007F:2530794-2536665 0.500388 1.1941 23.3448 9 173

==> tmp.seqs.txt <==
#sequence length cutsites
000001F_001:0-96233 96233 GATC:279
000001F_007:0-193102 193102 GATC:539
000002F_001:0-83357 83357 GATC:359
000003F_006:0-193077 193077 GATC:626
000004F_005:0-315603 315603 GATC:1034
000005F_004:0-794955 794955 GATC:2240
000005F_007:0-493655 493655 GATC:1242
000006F_004:0-1072880 1072880 GATC:2981
000007F_002:0-519483 519483 GATC:1438

zeeev commented

@aboyher I have a hunch. I updated the src code a while back and I changed the output of falcon-phase phase. I echoed these changes into the snakemake file. Is it possible you're running the latest code with an older version of the snakemake file? The output of falcon-phase phase looks really good.

Did snakemake issue no error? If you run snakemake -p it will print the commands it's trying to run. Can you manually run the emit stage. You can recreate it from this block:

rule emit_haplotigs :
     message    : "[info] emitting phased haplotigs"
     input      : EH=config['emit'], PH="phasing/{sample}.results.txt", BC="mince/{sample}.BC.bed", FA="{sample}.p_h_ctg.fa", BED=config['bedtools']
     output     : F0="output/{sample}.phased.0.fasta", F1="output/{sample}.phased.1.fasta", B0="output/{sample}.phased.0.bed", B1="output/{sample}.phased.1.bed"
     params     : FMT=config['sample']['output_format']
     shell      : """
            {input.EH} {input.PH} {input.BC} {input.FA} {input.BED} {params.FMT}

I'm not sure i understand. Should i just change the emit_haplotigs rule to what you posted above, or should i just go ahead and try the newer snakefile?

This is the error from snakemake:

[Wed May 15 20:58:55 2019]
Job 1: [info] emitting phased haplotigs

        /home/aboyher/src/FALCON-Phase/bin/emit_haplotigs.pl salsa_phased.phased.txt mince/salsa_phased.BC.bed salsa_phased.p_h_ctg.fa /home/aboyher/local/bin/bedtools > salsa_phased.diploid_phased.fasta

        rm tmp_phase1.txt
        rm tmp_phase0.txt

/usr/bin/bash: line 0: source: filename argument required
source: usage: source filename [arguments]
emit_haplotigs.pl phased.txt BC.bed clean_unzip_asm_p_h.fa path_to_bedtools output_format
Full Traceback (most recent call last):
File "/home/aboyher/.pyenv/versions/3.6.0/envs/py360/lib/python3.6/site-packages/snakemake/executors.py", line 1428, in run_wrapper
passed_shadow_dir)
File "/shares/bbart_share/aboyher/projects/cassava_assem/tme7_v0.4p/tme7_v0.4p_0/salsa/SALSA_output/falcon_phase/snakefile", line 71, in __rule_emit_haplotigs
rule index_pair :
File "/home/aboyher/.pyenv/versions/3.6.0/envs/py360/lib/python3.6/site-packages/snakemake/shell.py", line 149, in new
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'source ; set -eo pipefail ; /home/aboyher/src/FALCON-Phase/bin/emit_haplotigs.pl salsa_phased.phased.txt mince/salsa_phased.BC.bed salsa_phased.p_h_ctg.fa /home/aboyher/l
ocal/bin/bedtools > salsa_phased.diploid_phased.fasta

        rm tmp_phase1.txt
        rm tmp_phase0.txt' returned non-zero exit status 255.

[Wed May 15 20:58:55 2019]
Error in rule emit_haplotigs:
jobid: 1
output: salsa_phased.diploid_phased.fasta
shell:

        /home/aboyher/src/FALCON-Phase/bin/emit_haplotigs.pl salsa_phased.phased.txt mince/salsa_phased.BC.bed salsa_phased.p_h_ctg.fa /home/aboyher/local/bin/bedtools > salsa_phased.diploid_phased.fasta

        rm tmp_phase1.txt
        rm tmp_phase0.txt

Removing output files of failed job emit_haplotigs since they might be corrupted:
salsa_phased.diploid_phased.fasta
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /shares/bbart_share/aboyher/projects/cassava_assem/tme7_v0.4p/tme7_v0.4p_0/salsa/SALSA_output/falcon_phase/.snakemake/log/2019-05-15T205755.980115.snakemake.log
unlocking
removing lock
removing lock
removed all locks

zeeev commented

What happens if you manually run:

/home/aboyher/src/FALCON-Phase/bin/emit_haplotigs.pl salsa_phased.phased.txt mince/salsa_phased.BC.bed salsa_phased.p_h_ctg.fa /home/aboyher/l
ocal/bin/bedtools > salsa_phased.diploid_phased.fasta

I'm still looking for the error.

No worries. I "fixed" it. I updated my config.json and snakefile to the newer versions. I then moved the {sample}.filtered.bam and {sample}.unfiltered.bam files to the hic_mapping folder so snakemake could skip directly passed those steps. Reran it and it just finished. Phasing looks good. Thanks again Zev!!!