SchulzLab/Aeron

Error while calling fusions

nadiadavidson opened this issue · 4 comments

Hi, I'm trying to run Aeron for fusions finding and I'm running into an issue I can't resolve. The command:
snakemake --cores 4 all -s Snakefile
finished successfully, but I get the following error when I run:
snakemake --cores 4 all -s Snakefile_fusion

Building DAG of jobs...
MissingInputException in line 129 of /home/unimelb.edu.au/nadiamd/work_area/ideas_grant/Aeron/Snakefile_fusion:
Missing input files for rule sam_to_bam:
fusiontmp/reads_tofusions_onlyfusion_x50_Homo-sapiens_hg38.sam

My config.yaml is pasted below.

I'm also interested to know if the data from your fusion simulation (in your paper) is available somewhere for downloading?

Many thanks,
Nadia.

config.yaml:

#input files at top: check them!

# all input files must be in the folder ./input/
# use the full file name, including file ending

# input splice graph
# Should be in the input folder
# format must be .vg
graph: hg38.gfa

# reference transcripts
# format can be either fasta/fastq, gzipped or not
# Should be in the input folder

transcripts: Homo-sapiens.GRCh38.cdna.all.fa

# sequenced reads
# Should be in the input folder
# format can be either fasta/fastq, gzipped or not
# for more files, add them in new lines starting with "- "
# NOTE: the file names without ending must be unique! You cannot have eg. reads.fq and reads.fa
reads:
- x100.fixed.fastq.gz
- x50.fixed.fastq.gz
- x10.fixed.fastq.gz
- x2.fixed.fastq.gz
- x1.fixed.fastq.gz

# Needed for expression quantificatino
# Should be in the input folder
gtffile: Homo-sapiens.GRCh38.100.gtf

# needed to convert between alignment formats
# https://github.com/vgteam/vg
vgpath: /home/unimelb.edu.au/nadiamd/work_area/ideas_grant/Aeron/vg


#optional parameters below: default values will probably work

fusion_max_error_rate: 0.2
fusion_min_score_difference: 200

#size of the seed hits. Fewer means more accurate but slower alignments.
seedsize: 17
#max number of seeds. Fewer means faster but more inaccurate alignment
maxseeds: 20

# No need to change these

aligner_bandwidth: 35
alignment_selection: --greedy-length
alignment_E_cutoff: 1

scripts: AeronScripts
binaries: Binaries

Hi Nadia,
sorry for overlooking your question.
It looks like minimap2 did not produce an output. Could you please check if the following file was created: fusiontmp/minimap2_stderr_x50_Homo-sapiens_hg38.txt? If so paste the output here please.

Thanks,
Marcel

It looks like I don't even have a fusiontmp directory. I've pasted the files in some of the other output directories below in case that's helpful. The data is a small simulation of just a subset of genes. Would Aeron process that okay?

Thanks for your help.

Cheers,
Nadia

`output:
total 88K
-rw-rw-r-- 1 nadiamd nadiamd 976 Sep 30 15:48 alignmentstats_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 951 Sep 30 15:48 alignmentstats_x100_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 946 Sep 30 15:48 alignmentstats_x10_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 941 Sep 30 15:48 alignmentstats_x1_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 943 Sep 30 15:48 alignmentstats_x2_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 949 Sep 30 15:48 alignmentstats_x50_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 21 Sep 30 15:48 aln_Homo-sapiens_hg38_all.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_Homo-sapiens_hg38_full_length.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_Homo-sapiens_hg38_selected.gam
-rw-rw-r-- 1 nadiamd nadiamd 21 Sep 30 15:48 aln_x100_hg38_all.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x100_hg38_full_length.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x100_hg38_selected.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x100_hg38_selected.json
-rw-rw-r-- 1 nadiamd nadiamd 21 Sep 30 15:48 aln_x10_hg38_all.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x10_hg38_full_length.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x10_hg38_selected.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x10_hg38_selected.json
-rw-rw-r-- 1 nadiamd nadiamd 21 Sep 30 15:48 aln_x1_hg38_all.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x1_hg38_full_length.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x1_hg38_selected.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x1_hg38_selected.json
-rw-rw-r-- 1 nadiamd nadiamd 21 Sep 30 15:48 aln_x2_hg38_all.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x2_hg38_full_length.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x2_hg38_selected.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x2_hg38_selected.json
-rw-rw-r-- 1 nadiamd nadiamd 21 Sep 30 15:48 aln_x50_hg38_all.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x50_hg38_full_length.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x50_hg38_selected.gam
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 aln_x50_hg38_selected.json
-rw-rw-r-- 1 nadiamd nadiamd 17 Oct 15 14:18 CountMatrix_x100_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 17 Oct 15 14:18 CountMatrix_x10_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 17 Oct 15 14:18 CountMatrix_x1_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 17 Oct 15 14:18 CountMatrix_x2_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 17 Oct 15 14:19 CountMatrix_x50_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 294 Sep 30 15:48 matrixstats_x100_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 294 Sep 30 15:48 matrixstats_x10_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 294 Sep 30 16:16 matrixstats_x1_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 294 Sep 30 16:16 matrixstats_x2_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 294 Sep 30 15:48 matrixstats_x50_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x100_Homo-sapiens_hg38_all.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x100_Homo-sapiens_hg38_bestmatch.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x100_Homo-sapiens_hg38_unambiguous.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x10_Homo-sapiens_hg38_all.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x10_Homo-sapiens_hg38_bestmatch.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x10_Homo-sapiens_hg38_unambiguous.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x1_Homo-sapiens_hg38_all.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x1_Homo-sapiens_hg38_bestmatch.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x1_Homo-sapiens_hg38_unambiguous.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x2_Homo-sapiens_hg38_all.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x2_Homo-sapiens_hg38_bestmatch.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x2_Homo-sapiens_hg38_unambiguous.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x50_Homo-sapiens_hg38_all.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x50_Homo-sapiens_hg38_bestmatch.txt
-rw-rw-r-- 1 nadiamd nadiamd 0 Sep 30 15:48 matrix_x50_Homo-sapiens_hg38_unambiguous.txt

tmp:
total 92K
-rw-rw-r-- 1 nadiamd nadiamd 1.1K Sep 30 15:48 aligner_stderr_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 1.1K Sep 30 15:48 aligner_stderr_x100_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 1.1K Sep 30 15:48 aligner_stderr_x10_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 1.1K Sep 30 15:48 aligner_stderr_x1_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 1.1K Sep 30 15:48 aligner_stderr_x2_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 1.1K Sep 30 15:48 aligner_stderr_x50_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 557 Sep 30 15:48 aligner_stdout_Homo-sapiens_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 553 Sep 30 15:48 aligner_stdout_x100_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 551 Sep 30 15:48 aligner_stdout_x10_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 549 Sep 30 15:48 aligner_stdout_x1_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 550 Sep 30 15:48 aligner_stdout_x2_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 553 Sep 30 15:48 aligner_stdout_x50_hg38.txt
-rw-rw-r-- 1 nadiamd nadiamd 197 Sep 30 15:48 run_Homo-sapiens_hg38_summary.txt
-rw-rw-r-- 1 nadiamd nadiamd 193 Sep 30 15:48 run_x100_hg38_summary.txt
-rw-rw-r-- 1 nadiamd nadiamd 191 Sep 30 15:48 run_x10_hg38_summary.txt
-rw-rw-r-- 1 nadiamd nadiamd 189 Sep 30 15:48 run_x1_hg38_summary.txt
-rw-rw-r-- 1 nadiamd nadiamd 190 Sep 30 15:48 run_x2_hg38_summary.txt
-rw-rw-r-- 1 nadiamd nadiamd 193 Sep 30 15:48 run_x50_hg38_summary.txt
-rw-rw-r-- 1 nadiamd nadiamd 42 Sep 30 15:44 seedcache.aux
-rw-rw-r-- 1 nadiamd nadiamd 49 Sep 30 15:44 seedcache_index.aux
-rw-rw-r-- 1 nadiamd nadiamd 20 Sep 30 15:44 seedcache_index.isa
-rw-rw-r-- 1 nadiamd nadiamd 17 Sep 30 15:44 seedcache_index.lcp
-rw-rw-r-- 1 nadiamd nadiamd 20 Sep 30 15:44 seedcache_index.sa`

Hi Nadia
It seems the snakemake is not able to make the fusiontmmp directory in your case. We are looking into the problem. Can you in the meantime create a fusiontmp in the snakemake folder and try the pipeline again. I apologies for the inconvenience

Hi,

I finally got this running by limiting the number of fastq file to 1, so it seems like the fusion snakemake doesn't work with multiple files listed in config.yaml

Cheers,
Nadia.