Basecalling job runs but produces no output
maurerv opened this issue · 1 comments
Hello,
I am using the docker container version of your tool for the analysis of minION data. After fixing a few small issues related to STRique and albacore not being found in $PATH, I was able to run the tests you provided in the test folder without any issues.
I used the nanopype_import script to pack my fast5 files into a tar archive. The corresponding folder structure looks like this:
- /data/raw/
- 20191204_FAK....._FLO-MIN106_SQK-LSK109/
- reads/
- 0.tar
- fast5/
- ...0.fast5
- ...1.fast5
- ...
- reads.fofn
- reads/
- 20191204_FAK....._FLO-MIN106_SQK-LSK109/
Inside /data/processing i ran:
snakemake --snakefile ~/path/to/nanopype/Snakefile -j 7 sequences/guppy/20191204_FAK....._FLO-MIN106_SQK-LSK109.fastq.gz
Which gives the following result:
Now the pipeline actually produces the file in sequences/guppy/20191204_FAK....._FLO-MIN106_SQK-LSK109.fastq.gz however, it stays empty. Also judging from htop, nothing is happening and the job is rather being kept active.
I did not change anything in the nanopype.yaml and env.yaml files with the exception of replacing the reference chromosome in env.yaml like this
Currently, I am running docker version 19.03.1 on CentOS 7.
Please let me know, if there is anything wrong with my folder structure, the way I invoke the pipeline or if you need any additional information.
Thanks in advance!
Hi,
first: No need to use the import script on recent data. The bulk fast5 work directly with the pipeline. In your case, remove the tar/ folder, rename the fast5 to reads, delete the reads.fofn and re-run the indexing module.
To your problem: You're requesting to basecall a 'tag' (rule basecaller_merge_tag); For this nanopype needs to know all runs you want to merge from a file 'runnames.txt' in your working directory. Please see:
https://nanopype.readthedocs.io/en/latest/usage/general/#workflow-config
for the difference between processing a single flow-cell and merging multiple ones into a tag.
What you can try are two things:
a) create a file 'runnames.txt' with a single line '20191204_FAK....' and start the pipeline with
snakemake --snakefile ~/path/to/nanopype/Snakefile sequences/guppy/sample_name.fastq.gz -n
b) basecall only one flowcell with
snakemake --snakefile ~/path/to/nanopype/Snakefile sequences/guppy/batches/sample_name/20191204_FAK..._FLO-MIN106_SQK-LSK109.fastq.gz -n
I would recommend to stick with a) as it scales better for multiple runs.
For tests run snakemake with -n (dry run), if the rule 'basecaller_merge_tag' remains the only output, something is wrong, you would expect a long list of jobs from the rule 'guppy'.
Pay