assemblerflow/flowcraft

Integrity coverage fails

TheSequencingCenter opened this issue · 15 comments

#52 and #56 may be related to this issue.

I'm building the simplest pipeline on AWS (Linux): spades + docker.
The FastQ files are real bacterial datasets, 5Mb, 100X coverage. When I run the pipeline as follows, it always gives integrity_coverage errors regardless of what parameters I supply, and generates no output. I don't know why it's failing or how to work around it. Note: our normal AWS Spades/Docker pipeline works fine with these datasets.
Email: richardcaseyhpc@protonmail.com

nextflow run spades_pipe.nf --fastq "fastq/sample_R{1,2}.*" -profile docker --genomeSize 5 --minCoverage 0
N E X T F L O W ~ version 19.07.0
Launching spades_pipe.nf [disturbed_mahavira] - revision: 6267a78dba

============================================================
F L O W C R A F T

Built using flowcraft v1.4.1

Input FastQ : 2
Input samples : 1
Reports are found in : ./reports
Results are found in : ./results
Profile : docker

Starting pipeline at Wed Aug 14 19:11:42 UTC 2019

executor > local (8)
[24/294084] process > integrity_coverage_1_1 (sample_R) [100%] 8 of 8, failed: 8 ✔
[- ] process > report_coverage_1_1 -
[- ] process > report_corrupt_1_1 -
[- ] process > trimmomatic_1_2 -
[- ] process > status -
[- ] process > compile_status_buffer -
[- ] process > compile_status -
[- ] process > report -
[- ] process > compile_reports -
Completed at: Wed Aug 14 19:11:51 UTC 2019
Duration : 8.9s
Success : true
Exit status : 0
WARN: The channel create method is deprecated -- it will be removed in a future release
WARN: The channel create method is deprecated -- it will be removed in a future release
WARN: The channel create method is deprecated -- it will be removed in a future release
WARN: The channel create method is deprecated -- it will be removed in a future release
WARN: The channel create method is deprecated -- it will be removed in a future release
WARN: Process configuration syntax $processName has been deprecated -- Replace process.$trimmomatic_1_2 = <value> with a process selector
[eb/6187b2] NOTE: Process integrity_coverage_1_1 (sample_R) terminated with an error exit status (1) -- Execution is retried (1)
[f3/c9c01d] NOTE: Process integrity_coverage_1_1 (sample_R) terminated with an error exit status (1) -- Execution is retried (2)
[e6/8a39ef] NOTE: Process integrity_coverage_1_1 (sample_R) terminated with an error exit status (1) -- Execution is retried (3)
[97/d32042] NOTE: Process integrity_coverage_1_1 (sample_R) terminated with an error exit status (1) -- Execution is retried (4)
[99/ddd7ec] NOTE: Process integrity_coverage_1_1 (sample_R) terminated with an error exit status (1) -- Execution is retried (5)
[d6/f2a129] NOTE: Process integrity_coverage_1_1 (sample_R) terminated with an error exit status (1) -- Execution is retried (6)
[b1/4849d6] NOTE: Process integrity_coverage_1_1 (sample_R) terminated with an error exit status (1) -- Execution is retried (7)
[24/294084] NOTE: Process integrity_coverage_1_1 (sample_R) terminated with an error exit status (1) -- Error is ignored

Also having this issue. And weirdly after running the flowcraft generated nextflow pipeline, my backspace key longer works?

commands used in a flowcraft conda env:

flowcraft build -t "trimmomatic skesa abricate" -o trim_skesa_abricate_flowcraft.nf -n "second-try-flowcraft"
nextflow run trim_skesa_abricate_flowcraft.nf --fastq "reads/*{R1,R2}*.fastq.gz" -profile sge_sing

I see a lot of warnings like this:

WARN: The channel `create` method is deprecated -- it will be removed in a future release

WARN: Process configuration syntax $processName has been deprecated -- Replace `process.$trimmomatic_1_2 = <value>` with a process selector

[85/f9e73a] NOTE: Process `integrity_coverage_1_1 (2015am-0046_CAM4563A161_CGGCTATG-TATAGCCT_L002)` terminated with an error exit status (1) -- Error is ignored

and in the log files like so: work/XX/XXXXXXXXXXXXXXXXXXXX/.command.err I see (edited for brevity):

line 194: uname: command not found
line 146: grep: command not found
line 147: grep: command not found
line 208: date: command not found
Command 'ps' required by nextflow to collect task metrics cannot be found

Perhaps the PATHs are getting set incorrectly?

Hi @TheSequencingCenter and @kapsakcj! I haven't been able to fully look into your issues. Can you start by telling me what version of nextflow you have? @kapsakcj I'm fairly certain your problem is related to this issue #217 which is high priority but still without any development. For now, my solution would try to downgrade to nextflow version 18.10.1. Sorry about this!

Best,

Inês

It was with the latest version of nextflow 19.07. I'll try downgrading and see if that fixes it. Thanks for the quick response!

Downgrading to nf 18.10.1 made the nf pipeline run properly using the -profile standard option. Seems like the -profile sge_sing won't work for me, but that's likely due to my compute environment

Hello, I am also stuck with this issue after running any pipeline including spades, trimmomatic etc. The problem seems lying with the integrity_coverage.py template file, a process that is automatically added in front of all these workflows. I can't understand what is wrong, as I've basically just followed precisely the instructions provided. Note: I am running these in a conda environment.

This is the output of running: _nextflow run trimmomatic.nf --fastq "/data/x_R{1,2}x"

executor > local (8)
[22/d4554d] process > integrity_coverage_1_1 (X_genome1) [100%] 8 of 8, failed: 8 ✔
[- ] process > report_coverage_1_1 -
[- ] process > report_corrupt_1_1 -
[- ] process > fastqc_1_2 -
[- ] process > fastqc_report_1_2 -
[- ] process > trim_report_1_2 -
[- ] process > compile_fastqc_status_1_2 -
[- ] process > trimmomatic_1_2 -
[- ] process > status -
[- ] process > compile_status_buffer -
[- ] process > compile_status -
[- ] process > report -
[- ] process > compile_reports -
Completed at: Thu Oct 17 11:39:08 CEST 2019
Duration : 50.6s
Success : true
Exit status : 0
[55/dfd963] NOTE: Missing output file(s) *_encoding expected by process integrity_coverage_1_1 (X_genome1) -- Execution is retried (3)
[7a/e59573] NOTE: Missing output file(s) *_encoding expected by process integrity_coverage_1_1 (X_genome1) -- Execution is retried (4)
[9f/5ce485] NOTE: Missing output file(s) *_encoding expected by process integrity_coverage_1_1 (X_genome1) -- Execution is retried (5)
[b5/0f5b00] NOTE: Missing output file(s) *_encoding expected by process integrity_coverage_1_1 (X_genome1) -- Execution is retried (6)
[91/b68c08] NOTE: Missing output file(s) *_encoding expected by process integrity_coverage_1_1 (X_genome1) -- Execution is retried (7)
[22/d4554d] NOTE: Missing output file(s) *_encoding expected by process integrity_coverage_1_1 (X_genome1) -- Error is ignored

Anyone has any suggestion on what might be going on? is it a problem with the utf-8 encoding?
Thanks!

Hello! Can you please check one of the working directories for the integrity_coverage_1_1 process, for example work/55/dfd963* and paste here the contents of the .command.log file?

Hello, thanks for writing. This file is empty. Also, I want to clarify that I have the same issue after building the workflow with FlowCraft as well as with assemblerflow, and also after downgrading NextFlow to version 18.10.1 as suggested above.
I am following instructions provided online for these tools (which should work out-of-the-box supposedly).

Can you share the .nextflow.log file?

Hello, on your log file it says you're running nextflow version 19.07.0, but you also tried with version 18.10.1? Can you send me the log of that run? By default, a flowcraft pipeline will use singularity container, do you have this installed? You can change it to docker by adding -profile docker to your nextflow command.

As a side note the bug causing flowcraft pipelines to break on nextflow latest verion has been addressed here but I haven't had the oportunity to test it yet. (cc @kapsakcj)

Hello, uh so after some more tests I can report that with v. 18.10.1 the process completed successfully with '-profile docker' but failed with '-profile standard'. With version 19.07.0 it failed with both profile options, however the error is different:

-with -profile standard:
[83/474dab] NOTE: Missing output file(s) *_encoding expected by process integrity_coverage_1_1 (Salm_genome2) -- Execution is retried (3)

-with '-profile docker':
[4d/24d4a6] NOTE: Process integrity_coverage_1_1 (Salm_genome2) terminated with an error exit status (1) -- Execution is retried (1)

Note:

  • I have both docker and singularity installed system-wide.

  • I installed nextflow using curl

@stefcardinale are you running it in a cluster or a single server ? From your pot I suspect it is a single server. I am asking to see how this issue can be related to the issue of @kapsakcj on using -profile sge_sing. All these issues seem derived from Nextflow versions and not FlowCraft but we will look into them.

hello, yes I am using just a single (my) machine, no cluster.

Hi!

(this is not an issue)

It appears that nextflow compatibility problem. I just tried to run a fresh install (first from conda): flowcraft compil was 2018 something and the nf version was 20-something. It appeared as if integrity coverage (IC) finished, but any module attached after it would fail (i tried with trimmomatic, spades, skesa). To me it seemed as an integrity coverage issue, because the module after IC would fail.
I then tried the same using the dev install (git clone .. && cd flowcraft && python setup.py install), only installing the conda version of nf (still 20-sometihng). Got the same result.
Then downgraded nf to 18.10.1, as suggested in this thread, it worked. Changing the --profile parameter to standard, as suggested by @jacarrico did not seem to have and effect in nf 20-something (i didnt try with "docker").

Also to note nf 20-something (the conda default today, 20201117, takes in the --fastq param without the "", while nf 18-something nags and fails to interpret the input unless say the fastq names: --fastq "data/{R1,R2).fastq.gz" vs --fastq data/{R1,R2}.fastq.gz

If the conda package were to be statically assigned to depend on nf 18-something instead the most recent one, that should solve the problem (or at least work around it for a while).

Hello! There has been quite a change with nextflow in the latest releases, that we've decided (so far) not to address at it would break backwards compatibility, but now it seems we've reached the point that flowcraft generated workflows are not longer compatible with the latest stable release of nextflow. Indeed limit the version on the conda recipe would be a quick fix but the issue wouldn't really be addressed. A bit of an overhaul is necessary to address all of the changes.