epi2me-labs/wf-single-cell

Error in pipeline:process_bams:tag_bam

Closed this issue · 6 comments

Operating System

CentOS 7

Other Linux

No response

Workflow Version

wf-single-cell v2.0.2-ge9dac45

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

nextflow run epi2me-labs/wf-single-cell
-profile singularity
--expected_cells 50000
--fastq '/pipeline/Runs/Nanopore/20240514_1009_MN22007_FAY30355_68a76471/no_sample/20240514_1009_MN22007_FAY30355_68a76471/fastq_pass/FAY30355_pass_68a76471_936db429_24.fastq.gz'
--kit_name '5prime'
--kit_version 'v1'
--ref_genome_dir '/data/reference/dawson_labs/genomes/cellranger_reference_GRCh38-2020-A/refdata-gex-GRCh38-2020-A'
-w '/scratch/teams/dawson_genomics/Projects/PRC2_BE_screen/results/MF01_nanopore/epi2me_output/work/'
--out_dir '/scratch/teams/dawson_genomics/Projects/PRC2_BE_screen/results/MF01_nanopore/epi2me_output/'
--threads 16

Workflow Execution - CLI Execution Profile

None

What happened?

I got an error when running the pipeline on a test fastq file in the pipeline:process_bams:tag_bam process.
I checked the work directory of the failed process and since I'm running with one small test fastq file, some of the files in tags/ directory only contain the header line.
I feel line this line

chrom = getattr(next(iter(d.values())), "chrom")
will potentially break if there is an empty tags file.

I was able to run the demo data which only has 1 chromosome so it must be a problem if reads are from >1 but < all chromosomes.

Relevant log output

ERROR ~ Error executing process > 'pipeline:process_bams:tag_bam (1)'

Caused by:
  Process `pipeline:process_bams:tag_bam (1)` terminated with an error exit status (1)

Command executed:

  workflow-glue tag_bam         align.bam tagged.bam tags         --threads 4
  samtools index -@ 4 "tagged.bam"

Command exit status:
  1

Command output:
  (empty)

Command error:
  [23:38:29 - workflow_glue] Bootstrapping CLI.
  [23:38:29 - workflow_glue] Starting entrypoint.
  [23:38:29 - workflow_glue.TagBAMs   ] tags/tag_8.tsv contains tags for reference: chr18.
  [23:38:29 - workflow_glue.TagBAMs   ] tags/tag_17.tsv contains tags for reference: chr7.
  Traceback (most recent call last):
    File "/home/hholze/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow-glue", line 7, in <module>
      cli()
    File "/home/hholze/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/__init__.py", line 82, in cli
      args.func(args)
    File "/home/hholze/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/tag_bam.py", line 188, in main
      add_tags(args.tags, args.in_bam, args.out_bam, args.threads)
    File "/home/hholze/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/tag_bam.py", line 151, in add_tags
      store = TagStore(tags, bam=in_bam)
    File "/home/hholze/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/tag_bam.py", line 95, in __init__
      chrom = getattr(next(iter(d.values())), "chrom")
  StopIteration

Work dir:
  /scratch/teams/dawson_genomics/Projects/PRC2_BE_screen/results/MF01_nanopore/epi2me_output/work/fb/1c4b67308c1ddec9efceb6584854af

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
WARN: Killing running tasks (2)

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

The same error (during tag bam) happened to docker on Ubuntu as well when running either v2.0.2 or v2.0.0.

nextflow run epi2me-labs/wf-single-cell
-profile standard
-r v2.0.2
--ref_genome_dir /mnt/c373681d-f151-47ce-9133-962128da5732/danson/epi2me/refdata-gex-GRCh38-2020-A
--threads 32
--fastq /mnt/c373681d-f151-47ce-9133-962128da5732/danson/epi2me/4un/PBMC_Sample4_unstim_R1.fastq.gz
--kit_name 3prime
--kit_version v3
--expected_cells 1500
--full_length_only True
--out_dir /mnt/c373681d-f151-47ce-9133-962128da5732/danson/epi2me/4un/output

Hi @HenriettaHolze and @lscdanson

Looks like you found a bug. Will get a fix out ASAP. Thnaks

Hi all

Glad that this will be fixed.
I solved it (temporary) by changing the line to:
for value in iter(d.values()): chrom = getattr(value, "chrom")

Kind regards
Koen

@KoenDeserranno Many thanks for the suggestion! Managed to complete my run after adopting the change.

The change:

for value in iter(d.values()): chrom = getattr(value, "chrom")

on its own will potentially drop information and exlucde reads from analysis which would otherwise be kept (due to variable scoping in Python).

v2.0.3 will be available shortly and will fix the immediate error occurring here.