epi2me-labs/wf-single-cell

All integer `UY` column breaks tag_bam.py

Closed this issue · 3 comments

Operating System

Other Linux (please specify below)

Other Linux

18.04.4

Workflow Version

v1.0.1-ga6a1b69

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

~/nextflow-23.12.0-edge-all run epi2me-labs/wf-single-cell \
    --fastq fastq/ \
    --kit_name multiome \
    --kit_version v1 \
    --expected_cells 5000 \
    --ref_genome_dir /home/ubuntu/cellranger/GRCh38-2020-A/ \
    --sample $SAMPLE \
    -c openstack.cfg \
    -profile standard

Workflow Execution - CLI Execution Profile

None

What happened?

I ran into a super unlikely situation - my GL000213.1 had a single read in the tags.tsv, and said read's UY was all integers. As a result, pandas loading the data frame interprets the column as an integer, which leads to a subsequent conflict with pysam trying to set tags.

I've implemented an ad-hoc patch to .astype(str) the UY column for that specific chrom only to get on with the execution.

Would probably be worth it to make sure the columns are typed appropriately?

Relevant log output

Traceback (most recent call last):
    File "/home/ubuntu/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow-glue", line 7, in <module>
      cli()
    File "/home/ubuntu/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/__init__.py", line 72, in cli
      args.func(args)
    File "/home/ubuntu/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/tag_bam.py", line 88, in main
      add_tags(args.tags, args.in_bam, args.out_bam, args.chrom, args.flip)
    File "/home/ubuntu/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/tag_bam.py", line 69, in add_tags
      align.set_tag('UY', row['UY'], value_type="Z")
    File "pysam/libcalignedsegment.pyx", line 2316, in pysam.libcalignedsegment.AlignedSegment.set_tag
    File "pysam/libcalignedsegment.pyx", line 2400, in pysam.libcalignedsegment.AlignedSegment.set_tag
    File "pysam/libcutils.pyx", line 134, in pysam.libcutils.force_bytes
  TypeError: Argument must be string, bytes or unicode.

Application activity log entry

No response

Hi @ktpolanski
It's not as unlikely as you might think, you are the third to report this 😅 . It has been fixed on the prerelease branch. Please try it out adding -r prerelease to your command.

Great to hear that it's on the way out! I don't think I can try the -r prerelease right now as I'm running a tweaked version of the workflow in relation to #18 (added a couple more filter codes)

Hi @ktpolanski

A fix for this issue in out in release v1.0.3. Closing this issue now, but please let me know if it's not fixed you.