yachielab/SPADE

List index out of range error

Closed this issue · 6 comments

Hi.

I'm running SPADE within a singularity container. This is the Dockerfile:

FROM continuumio/miniconda3:4.8.2

SHELL ["/bin/bash", "-c"]

LABEL description="Spade (Search for Patterned DNA Elements) container" \
      version="1.0.0"

RUN apt-get update --fix-missing && \
    apt-get install -y procps

RUN conda update -n base conda && \
    conda install -c conda-forge -c bioconda \
	python=3.6 mafft=7.455 blast=2.9.0 openssl=1.1.1e && \
    conda clean --all -f -y

WORKDIR /opt
RUN git clone --depth 1 https://github.com/yachielab/SPADE && \
    cd SPADE && chmod u+x *.py && \
    pip install matplotlib==2.2.3 && \
    pip install seaborn==0.8.1 && \
    pip install weblogo==3.6.0 && \
    pip install biopython

ENV PATH="/opt/SPADE:${PATH}"

CMD [ "SPADE.py" ]

When running spade (via workflow package Nextflow):

Command executed:

  SPADE.py -i DA34821_pseudomolecule.fasta -f fasta -t nucl -d -n 4

I get the following error:

Command output:
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_3328933_3329044
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_3375436_3377156
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_3481197_3481657
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_3785859_3785893
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_3956278_3956881
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4010576_4011560
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4018737_4019448
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4020959_4021133
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4023623_4024253
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4093472_4094076
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4100467_4105660
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4271700_4271796
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4313171_4313782
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4344856_4345464
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4370863_4371272
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4414478_4415085
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4550143_4556044
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4562903_4563392
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4563693_4563864
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4566812_4566973
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4615514_4615606
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4834118_4834398
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4857100_4857156
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4857637_4857655
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4894389_4894563
  list index out of range

Command error:
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4313171_4313782/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4344856_4345464/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4370863_4371272/query.fasta'
  Command line arr: Argument "query". File is not accessible:  `./nucl_1663958_1669493/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_1892097_1894787/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_2177089_2177696/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_2219687_2220166/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_2496611_2502210/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_2535334_2535963/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_2540192_2541435/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_2541821_2545685/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_2695205_2700564/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_2791127_2796412/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_2877855_2883756/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3213117_3213377/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3260582_3261345/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3297030_3297896/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3323972_3324117/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3324408_3326112/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3326614_3327904/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3328933_3329044/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3375436_3377156/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3481197_3481657/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3785859_3785893/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3956278_3956881/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4010576_4011560/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4018737_4019448/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4020959_4021133/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4023623_4024253/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4093472_4094076/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4100467_4105660/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4271700_4271796/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4313171_4313782/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4344856_4345464/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4370863_4371272/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4414478_4415085/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4550143_4556044/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4562903_4563392/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4563693_4563864/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4566812_4566973/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4615514_4615606/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4834118_4834398/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4857100_4857156/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4857637_4857655/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4894389_4894563/query.fasta'
  /opt/conda/lib/python3.6/site-packages/matplotlib/font_manager.py:1331: UserWarning: findfont: Font family ['Arial'] not found. Falling back to DejaVu Sans
    (prop.get_family(), self.defaultFamily[fontext]))
  /opt/SPADE/visualisation.py:330: UserWarning: Ghostscript was not found. If you want to generate LOGO in pdf format, please install Ghostscript.
   For the present, LOGO was output in EPS format
    warnings.warn("Ghostscript was not found. If you want to generate LOGO in pdf format, please install Ghostscript.\n For the present, LOGO was output in EPS format")

Could you help me solve the index out of range error?

I'm sorry for the inconvenience and the late response.
Since I want to reproduce the above error, could you share the sequence information you used?

Hi, I haven't received permission from my collaborators to share the sequence, so I've created a toy sequence that generates the same error.

seq.fasta:

>random sequence
ccgccaagagaatgtatgaataggttgagtataagacagccctatgttcaccgtctgcag
actccggacatggggtcacggagtaaagtcgtgctgagcacataaagtttagccatcagc
ctgaagtactgcagtatgtagagcgtctataggacgcagctctcacatatgtggatcccg
aagctcaactatactgaggagaggtcagtgtccgtagattcgaccgcaccaccgacaact
cccaactgctcggaggtaatcggcgcggggtcgcgagatcttggttgtcgggtgaatttg
attccactctttcgctgtaggcaaattccatttagttgcctccggggatagcctcactaa
tttattttgaaagtgtctgggtacacattagttggctacgccaccacgagcgttaaaccc
aacctcttagcctgggtaccctgactcgctcctccatatttttgattcagctccggcggg
tcaataacccgtggctggcgagactaacggattcccgtgccccagcctcaggcttaaagc
ctctttagaattaggcggtccgatactgagtcagtctaaaaaaatagtttagagtggtcc
atatgacgacctcgcggcattcatggtatttcgctaccgatcccctcctcaggggcacct
atacggcctacgagactaggctactcgggctctcctaagtgccggagaacttaaggcctg
gcccatacaatcctttccccaataagttggaccactatatatgccgtgagacgtcttcta
tccggagaaccaaatgttgggggccaaacaggctaagagttcgagttccgagccttggca
attagctcgatcgggtgaaaaaaagagaagagtccattaagagttacatgttgttttcag
gcgatctggtccacactggtactgtccgtatgcactagctagagttcagtacagttttac
atgcttcttccgtctagagcctatcgcggtctcatcgaagctacagagtaaatagggccg
gactaggatgtgtcccgtccttgtggtttattgagcactccaggcgcttgggtacctgta
ctcagagtaacgaactcgtccagtagcgttctagagttcctaatatatgagtcgaggtcc
tgcgcctaaggaagaaatggtcattatgcgataagtgacccgcttagaaggggtaggagg
taactcaccttggtcaattttaccatctgtacctaaaagtctcctgattagtgtcccgat
atcccacggttaagtcagcannnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnngtactcgagaatgcactcccgacgtgatgatggttcatcg
gaaaacgtacggctaccacggaatactttccactccttaccctacgccggcacgacccga
gaattagacactagcccgaacggcatggagtgcccagaacgtggggacgtaggatgttag
tagaacgtcaataggcgcgggagtctaggtaccgcactgatcatactctatagggtgtcg
caaagcggatattcagaaatgactagtcagcatgcacgtctgatggcgaactcttgggcg
cttggttcttggcgataatccgacatgggatgttcgatacggtcgcggtctgtcccgccg
agatagtgcgggttcgcctcactctgctaaagtctccggtacttgccagactccacatac
tgggtgtgccgaccgaggtggtgcaggaggggtcatacaggacccttgtaccctcctgtt
ccgctcgctgcaaaatggtgtggtaagttcgaggctgttgaggcaatggtcccataaatc
agtgactgaccaccaactactagcggggtgcgggacgataccgactcggccatctgctcc
aggcgttatgtccctagtttcacggtgattagtaaccatttggccaaacagcgaaatact
acaataggaaataccggctattcaaacggctatcgtggatcggccaccgagggtttacgc
ataaagatttcggcattagttctcacctgtgtgccccacgaacgaacaggcttccggtgc
tacgacgctgccagctttctttcgttacataattcttggtgaaaataacattgacctgcg
gctcatcacgtattatcctacgtgaaattcgcctggaagttgtgaaccattaatatagtc
agtgcatttttatcgcagggccagtaacacgcgcgtcgtccattacgctttcatgggtat
ttagacttcaagtattagggttagaaactgagcaagctggctcgtgtggcatccgccacg
cctaacgctctctatagacatcattagaaaggcatctgctatcacagtatgagtgctttc
ctttcctttatacccgtgtgttgactccgttttcaggctgggcccacctcgataacgccc
tatcaacacacagcatttcaatataacccggtagcgaatgcgccttgcggctctcccggg
ccgccgctcacaacaggaagtaacatacgtccggaatccgctgctgggtaattcagttca
taaaggtcactgaagcggaaagggctctcgacgcgagtatcgtctacagctggttccagc
ctaccactccgccgtgtacggactgataccgcttagtagggattgagatgagcggaccat
ggtaatgggttacatattgagctgaattttactcagggcggtattattactatgtttagt
tatccgataagtgttataccgcgtccgaaggtcagtaagattcactacgcaccggtagtg
cagggtcccttttgctctcaggtgacactgcaattctaggcgaggcccattccccgttgg
attaaaacagaactcggcgtatggtttagtttgagcccatatcagcaagatgactgcccc
actgtgtgacccatcaccctaccattgagcctaggtgatcgttgtcaattgctgttcaga
taggagcgcagagcccgaaaacagagtaatatgtcatgtaaccagtcgcgtggtgacatc
tggggtcttcatggtctggatgcccataccaacgcgtagtcgctggtccctacatgcctt
agacacgttatcggccactcnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnggtatttgcgtggatatccatagtcgatccttgcggcaag
tcctacactgacatccccgaactccgtccattatgctcaaactatatcaccttggggcgc
caaggcatccataatgacctaagattttgtagccccccgccgcaacttgacgctcgaacg
acctatgcccggctctcagcttaacacgggtcggttatcataaaacgataaagagaaccg
tgttttgggcacgtgaggagcgccacgctccgacgcctatctaaccaatctatacgtaca
ggaagggcccaatgagtgtgcatagcgatgctgcgtttccagtacagtccatctccgcct
ctaatagggcacgccaccgcaccttgagctacggtgcccgcgaaaccgtagatgcggttt
gaacttggcagcgttattgccgaaccagcacgtgcgttgggtttatcatacgatggctgt
ttagggacatagtgggatcactagtattagcccttcacgtcttttcaggtgtagtaaaaa
actttgagcaacgttcttcgttaaacatacatgctgaaactcacgtggccgacatcccct
ctcacaagcgagtctttgtacattgatacgtgaagttgccctcctaactttaaagaagca
tgccacgatttcggtaagctttgcaaaaacaacgggcaaccttcagctgcctctcataag
atagcgcatgcttggcccccgcgctttcatagtcagaaacaacccctttcacccacatat
gtagactaaggtctgtccgagtgacagacagttttacggcacagattgtacgacgcctcc
gtctatcttacgtggttccccgatattcatggggtgcggtgctgacacaagctcttttat
aaggcatgatgaatgggtcctagacgctctgtggttccaactggtttttgcgtagtcgtt
tttagataagatatacccgccccttcctcccgaggcaactaggactgtaagcatgtacac
cggtactaccccttaggaatgtcacttcttgtcttggatatcgtctagcgcttgactcca
ctgcgcagttttgtgtagggttaccgatctaacaaaactgacggccctttgacgcccgca
aggggctccactcttacgagcctcagggaaagagattcagcacctgcgcactatggaggt
acaatttccgattcagtatcccatttacagataagcttggaaaccccagtcgctcattag
tggcaacgtgaagaccgaattcattcaatgccgatatccgccgtccgtgaattggtttca
acgtcgcccagtgtccgtcaggctcgaatggataacgcacgcctacgctatatgttgaca
aataaactactacacctctgaccatttactccaaattaacccatttgctatcgatcattt
ggagaaggcctttcacagataaagcacaccatttattctaccccgaaggcagcgttctac
gactgccccgctcccgatattggcatagcttcgaggtcagaataagtgagattacaccac
gcgcggggccaaacaccagggagtctcgcattttcttgtattgctgtgtttagagagaca
attgagtttccgagtgcggagttgtgagttttatcttgacttaatactcctctcgagcta
acgcagtgagcgtctcacaaccagcttaattcgcatcatcggacattccttgtgaacaat
cgatcgcccacaacgagtcccgctaggggcttgtgtagatatctggaaccagttacaaat
cagatatacagcactgtgttggcttactactgaagaacagatggcaccgtaagtcgcggc
agcgttttcgaaactagctgtgagaccaagcgatctaagtggcagcatagcttggacggg
ctatgcacgaagccacgctc

This is the command line instruction:

SPADE.py -i seq.fasta -f fasta -t nucl -d -n 4

Error:

Error in decide_query_all. dir random nucl_1280_1340
No counts.
Error in decide_query_all. dir random nucl_3140_3380
No counts.
Command line argument error: Argument "query". File is not accessible:  `./nucl_1280_1340/query.fasta'
Command line argument error: Argument "query". File is not accessible:  `./nucl_3140_3380/query.fasta'
Error in make_se_sets_all. dir random nucl_1280_1340
list index out of range
Error in make_se_sets_all. dir random nucl_3140_3380
list index out of range

I ran SPADE for the given sequence and faced the above error. There was a problem with the N-nucleotide manipulation, so I fixed it and updated the repository. Could you re-clone the git repository and run again?

Hi, I'm still facing similar errors with my real dataset, after pulling the updated code.

The errors are reduced, but still present.

Command executed:

  SPADE.py -i DA34821_pseudomolecule.fasta -f fasta -t nucl -d -n 4

Command exit status:
  0

Command output:
  Error in decide_query_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_192268_193122
  No counts.
  Error in decide_query_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_513395_513988
  No counts.
  Error in decide_query_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_1137245_1138409
  No counts.
  Error in decide_query_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_2219687_2220166
  No counts.
  Error in decide_query_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_3481197_3481657
  No counts.
  Error in decide_query_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_3956278_3956881
  No counts.
  Error in decide_query_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4370863_4371272
  No counts.
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_192268_193122
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_513395_513988
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_1137245_1138409
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_2219687_2220166
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_3481197_3481657
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_3956278_3956881
  list index out of range
  Error in make_se_sets_all. dir pseudo_used_ENA_CP029567_CP029567.1 nucl_4370863_4371272
  list index out of range
 
Command error:
  Command line argument error: Argument "query". File is not accessible:  `./nucl_192268_193122/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_513395_513988/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_1137245_1138409/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_2219687_2220166/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3481197_3481657/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_3956278_3956881/query.fasta'
  Command line argument error: Argument "query". File is not accessible:  `./nucl_4370863_4371272/query.fasta'
  /opt/conda/lib/python3.6/site-packages/matplotlib/font_manager.py:1331: UserWarning: findfont: Font family ['Arial'] not found. Falling back to DejaVu Sans
    (prop.get_family(), self.defaultFamily[fontext]))

Do lines 93, 108, and 189, also need to be changed from 'DNA' to 'nucl'?

Yes..., I was so ashamed. I updated the repository, so could you retry? I'm sorry for the inconvenience.

Thank you for working on it so quickly. It seems to work now.