Weeks-UNC/shapemapper2

not understandable Error message about primers

Closed this issue · 4 comments

Hello,
I run analysis for three zones, so I have 3 pairs of primers in the file "prim", but I obtained following message:

Started ShapeMapper v2.1.5 at 2022-06-23 02:38:19
Output will be logged to test_Long_shapemapper_log.txt
Running from directory: /home/irina/SHapeMap/ShapeMap_2022_06_13
args: --name test_Long --target file.fasta --amplicon --primers prim --overwrite --min-depth 1000 --modified --folder plus --untreated --folder minus --denatured --folder den --preserve-order --correct-seq --folder minus --output-processed-reads --output-aligned-reads --output-parsed-mutations --output-counted-mutations
Warning: no random primer length was specified, but at least one RNA is longer than a typical directed-primer amplicon. Use --random-primer-len to exclude mutations within primer binding regions.
Created pipeline at 2022-06-23 02:38:19
Running PrimerLocator at 2022-06-23 02:38:19 . . .
. . . done at 2022-06-23 02:38:19
Running FastaFormatChecker at 2022-06-23 02:38:19 . . .
. . . done at 2022-06-23 02:38:19
Running BowtieIndexBuilder at 2022-06-23 02:38:19 . . .
. . . done at 2022-06-23 02:38:20
Running process group 4 at 2022-06-23 02:38:20 . . .
Including these components:
Appender1 . . . started at 2022-06-23 02:38:20
Appender2 . . . started at 2022-06-23 02:38:20
ProgressMonitor . . . started at 2022-06-23 02:38:20
QualityTrimmer1 . . . started at 2022-06-23 02:38:20
QualityTrimmer2 . . . started at 2022-06-23 02:38:20
Interleaver . . . started at 2022-06-23 02:38:20
Merger . . . started at 2022-06-23 02:38:20
Tab6Interleaver . . . started at 2022-06-23 02:38:20
BowtieAligner . . . started at 2022-06-23 02:38:20
MutationParser . . .Traceback (most recent call last):
File "/home/irina/bin/shapemapper-2.1.5/internals/python/pyshapemap/component.py", line 350, in format_command
formatted = command.format(**values)
KeyError: 'primers'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/irina/bin/shapemapper-2.1.5/internals/python/cli.py", line 141, in
run(sys.argv)
File "/home/irina/bin/shapemapper-2.1.5/internals/python/cli.py", line 70, in run
success = pipeline.run()
File "/home/irina/bin/shapemapper-2.1.5/internals/python/pyshapemap/pipeline.py", line 717, in run
component.start_process(verbose=self.verbose)
File "/home/irina/bin/shapemapper-2.1.5/internals/python/pyshapemap/component.py", line 666, in start_process
formatted_cmd = self.format_command(self.cmd())
File "/home/irina/bin/shapemapper-2.1.5/internals/python/pyshapemap/component.py", line 356, in format_command
raise KeyError(msg)
KeyError: "Error: for component MutationParser, 'primers' node not linked to a filename or parameter, or that node name does not exist."

Could you explain what that could mean, please?
Best regards,
Irina.

This looks like a bug. I cannot get --amplicon to work with --correct-seq. However, I don't think the sequence correction component needs to know the primer sequences. I'll look into this further. For now I think you can get around this by making two separate calls to shapemapper. Let me know if this solves your issue.

  1. Only perform sequence correction.
shapemapper --name test_Long--target file.fasta --out correct-sequence --correct-seq --folder minus
  1. Locate the corrected fasta file and run the full shapemapper pipeline.
shapemapper --name test_Long --target {path to new fasta file} --amplicon --primers prim --overwrite --min-depth 1000 --modified --folder plus --untreated --folder minus --denatured --folder den --preserve-order --output-processed-reads --output-aligned-reads --output-parsed-mutations --output-counted-mutations

Yes, Thank you very much. This definitely solved the problem. It seems --correct-seq doesn't change my reference sequence, but excluding --correct-seq from the shapemapper command line helps. I thought that adding --correct-seq to the command line would let me run the correction of the sequence first, and then the pipeline would automatically run on the corrected sequence.
Maybe there is also a tutorial, explaining when to consider the right options, available?

I'm glad this worked for you. You are correct about the expected behavior when using --correct-seq. However, there was a bug in this component of ShapeMapper. It will be fixed in the next version.

There's no tutorial for all of the options except what is found in the documentation. Take a look at the usage examples in the README. There's also a lot of information in /docs.

Here's a brief explanation of some of your flags in case you are unsure:

  • --preserve-order --output-processed-reads and --output-aligned-reads are useful for in-depth debugging. You probably don't need these.
  • --output-parsed-mutations produces a parsed.mut file required for RingMapper, PairMapper, and DanceMapper.
  • --output-counted-mutations produces a table with mutation rates broken down by mutation type, e.g. A->C, deletion, multinucleotide deletion, etc. This is mostly for technology development purposes or if you are looking for evidence of a specific RNA modification such as A->I.

I often use --per-read-histograms for quality control when using RingMapper, PairMapper, or DanceMapper. It will produce a table in the log.txt file containing the read length distribution and the mutations per molecule distribution. For these analyses, Mutations per molecule should be high (>=5), and reads should be long.

Thanks a lot for your help and for explaining the options I used. Actually, I used--preserve-order --output-processed-reads and --output-aligned-reads to be able to parse the bam / sam file in case we had any doubts.