Lioscro/ngs-tools

ngs_tools.gtf.Segment.SegmentError: Invalid segment

Closed this issue · 2 comments

zhewa commented

Hi,

I am trying to run nf-core/scrnaseq using kallisto aligner. During the step of generating reference index, the following error occurred. It seems to have something to do with segment of zero length. I am using GRCh38.p14 fasta and gtf files from NCBI with appended ERCC transcripts. Do you know how to fix this?

Thank you

Workflow execution completed unsuccessfully


Caused by:
  Missing output file(s) `kb_ref_out.idx` expected by process `NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (GCF_000001405.40_GRCh38.p14_genomic_ERCC92.fna.gz)`

Command executed:

  kb \
      ref \
      -i kb_ref_out.idx \
      -g t2g.txt \
      -f1 cdna.fa \
      --workflow standard \
      GCF_000001405.40_GRCh38.p14_genomic_ERCC92.fna.gz \
      GCF_000001405.40_GRCh38.p14_genomic_ERCC92.gtf.gz
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF":
      kallistobustools: $(echo $(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*$//')
  END_VERSIONS

Command exit status:
  0

Command output:
  (empty)

Command error:
  [2022-06-28 15:54:36,355] WARNING [main] Gene `RNU6-222P_21` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `RNU6-222P_21`.
  [2022-06-28 15:54:36,355] WARNING [main] Gene `KIR2DP1_29` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR2DP1_29`.
  [2022-06-28 15:54:36,355] WARNING [main] Gene `KIR3DP1_32` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR3DP1_32`.
  [2022-06-28 15:54:36,355] WARNING [main] Gene `RNU6-222P_22` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `RNU6-222P_22`.
  [2022-06-28 15:54:36,355] WARNING [main] Gene `KIR3DP1_33` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR3DP1_33`.
  [2022-06-28 15:54:36,355] WARNING [main] Gene `KIR2DP1_30` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR2DP1_30`.
  [2022-06-28 15:54:36,355] WARNING [main] Gene `KIR2DP1_31` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR2DP1_31`.
  [2022-06-28 15:54:36,355] WARNING [main] Gene `KIR3DP1_34` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR3DP1_34`.
  [2022-06-28 15:54:36,355] WARNING [main] Gene `RNU6-222P_23` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `RNU6-222P_23`.
  [2022-06-28 15:54:36,355] WARNING [main] Gene `KIR2DP1_32` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR2DP1_32`.
  [2022-06-28 15:54:36,356] WARNING [main] Gene `KIR3DP1_35` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR3DP1_35`.
  [2022-06-28 15:54:36,356] WARNING [main] Gene `RNU6-222P_24` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `RNU6-222P_24`.
  [2022-06-28 15:54:36,356] WARNING [main] Gene `KIR2DP1_33` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR2DP1_33`.
  [2022-06-28 15:54:36,356] WARNING [main] Gene `KIR3DP1_36` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR3DP1_36`.
  [2022-06-28 15:54:36,356] WARNING [main] Gene `RNU6-222P_25` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `RNU6-222P_25`.
  [2022-06-28 15:54:36,356] WARNING [main] Gene `KIR3DP1_37` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR3DP1_37`.
  [2022-06-28 15:54:36,356] WARNING [main] Gene `RNU6-222P_26` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `RNU6-222P_26`.
  [2022-06-28 15:54:36,356] WARNING [main] Gene `KIR2DP1_34` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR2DP1_34`.
  [2022-06-28 15:54:36,356] WARNING [main] Gene `KIR3DP1_38` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR3DP1_38`.
  [2022-06-28 15:54:36,356] WARNING [main] Gene `RNU6-222P_27` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `RNU6-222P_27`.
  [2022-06-28 15:54:36,356] WARNING [main] Gene `RNU6-222P_28` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `RNU6-222P_28`.
  [2022-06-28 15:54:36,357] WARNING [main] Gene `KIR3DP1_39` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR3DP1_39`.
  [2022-06-28 15:54:36,357] WARNING [main] Gene `KIR2DP1_35` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR2DP1_35`.
  [2022-06-28 15:54:36,357] WARNING [main] Gene `RNU6-222P_29` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `RNU6-222P_29`.
  [2022-06-28 15:54:36,357] WARNING [main] Gene `KIR3DP1_40` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR3DP1_40`.
  [2022-06-28 15:54:36,357] WARNING [main] Gene `KIR2DP1_36` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR2DP1_36`.
  [2022-06-28 15:54:36,357] WARNING [main] Gene `RNU6-222P_30` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `RNU6-222P_30`.
  [2022-06-28 15:54:36,357] WARNING [main] Gene `KIR3DP1_41` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR3DP1_41`.
  [2022-06-28 15:54:36,357] WARNING [main] Gene `KIR2DP1_37` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR2DP1_37`.
  [2022-06-28 15:54:36,358] WARNING [main] Gene `RNU6-222P_31` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `RNU6-222P_31`.
  [2022-06-28 15:54:36,358] WARNING [main] Gene `KIR3DP1_42` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR3DP1_42`.
  [2022-06-28 15:54:36,358] WARNING [main] Gene `KIR2DP1_38` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR2DP1_38`.
  [2022-06-28 15:54:36,358] WARNING [main] Gene `KIR3DP1_43` has no transcripts. The entire gene will be marked as a transcript and an exon with ID `KIR3DP1_43`.
  [2022-06-28 15:54:41,332]   ERROR [main] An exception occurred
  Traceback (most recent call last):
    File "/usr/local/lib/python3.9/site-packages/kb_python/main.py", line 856, in main
      COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
    File "/usr/local/lib/python3.9/site-packages/kb_python/main.py", line 168, in parse_ref
      ref(
    File "/usr/local/lib/python3.9/site-packages/ngs_tools/logging.py", line 62, in inner
      return func(*args, **kwargs)
    File "/usr/local/lib/python3.9/site-packages/kb_python/ref.py", line 393, in ref
      gene_infos, transcript_infos = ngs.gtf.genes_and_transcripts_from_gtf(
    File "/usr/local/lib/python3.9/site-packages/ngs_tools/gtf/__init__.py", line 190, in genes_and_transcripts_from_gtf
      introns = exons.invert(transcript_interval)
    File "/usr/local/lib/python3.9/site-packages/ngs_tools/gtf/SegmentCollection.py", line 108, in invert
      Segment(self._segments[i].end, self._segments[i + 1].start)
    File "/usr/local/lib/python3.9/site-packages/ngs_tools/gtf/Segment.py", line 27, in __init__
      raise SegmentError(f'Invalid segment [{start}:{end})')
  ngs_tools.gtf.Segment.SegmentError: Invalid segment [1095094:1095094)

Work dir:
  s3://***

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`````

Hi, @zhewa,
Zero-length segments are supported since version 1.5.13.
Could you try updating the package?

zhewa commented

Hi,

Yes. After updating the package it ran successfully. Thank you.