giesselmann/nanopype

Basecalling job runs but produces no output

maurerv opened this issue · 1 comments

Hello,

I am using the docker container version of your tool for the analysis of minION data. After fixing a few small issues related to STRique and albacore not being found in $PATH, I was able to run the tests you provided in the test folder without any issues.

I used the nanopype_import script to pack my fast5 files into a tar archive. The corresponding folder structure looks like this:

  • /data/raw/
    • 20191204_FAK....._FLO-MIN106_SQK-LSK109/
      • reads/
        • 0.tar
      • fast5/
        • ...0.fast5
        • ...1.fast5
        • ...
      • reads.fofn

Inside /data/processing i ran:
snakemake --snakefile ~/path/to/nanopype/Snakefile -j 7 sequences/guppy/20191204_FAK....._FLO-MIN106_SQK-LSK109.fastq.gz

Which gives the following result:

image

Now the pipeline actually produces the file in sequences/guppy/20191204_FAK....._FLO-MIN106_SQK-LSK109.fastq.gz however, it stays empty. Also judging from htop, nothing is happening and the job is rather being kept active.

I did not change anything in the nanopype.yaml and env.yaml files with the exception of replacing the reference chromosome in env.yaml like this

image

Currently, I am running docker version 19.03.1 on CentOS 7.

Please let me know, if there is anything wrong with my folder structure, the way I invoke the pipeline or if you need any additional information.

Thanks in advance!

Hi,
first: No need to use the import script on recent data. The bulk fast5 work directly with the pipeline. In your case, remove the tar/ folder, rename the fast5 to reads, delete the reads.fofn and re-run the indexing module.

To your problem: You're requesting to basecall a 'tag' (rule basecaller_merge_tag); For this nanopype needs to know all runs you want to merge from a file 'runnames.txt' in your working directory. Please see:
https://nanopype.readthedocs.io/en/latest/usage/general/#workflow-config
for the difference between processing a single flow-cell and merging multiple ones into a tag.

What you can try are two things:
a) create a file 'runnames.txt' with a single line '20191204_FAK....' and start the pipeline with

snakemake --snakefile ~/path/to/nanopype/Snakefile sequences/guppy/sample_name.fastq.gz -n

b) basecall only one flowcell with

snakemake --snakefile ~/path/to/nanopype/Snakefile sequences/guppy/batches/sample_name/20191204_FAK..._FLO-MIN106_SQK-LSK109.fastq.gz -n

I would recommend to stick with a) as it scales better for multiple runs.

For tests run snakemake with -n (dry run), if the rule 'basecaller_merge_tag' remains the only output, something is wrong, you would expect a long list of jobs from the rule 'guppy'.

Pay