UC-Davis-molecular-computing/nuad

RNAplex output missing lines with sufficiently long input

Opened this issue · 0 comments

Run examples/many_strands_no_common_domains.py with num_strands set to 26 or higher and just the constraint named domain_pairs_rna_plex_constraint chosen, with random_seed=1. It gives this error:

Traceback (most recent call last):
  File "/mnt/c/Dropbox/git/nuad/examples/many_strands_no_common_domains.py", line 185, in <module>
    main()
  File "/mnt/c/Dropbox/git/nuad/examples/many_strands_no_common_domains.py", line 181, in main
    ns.search_for_sequences(design, params)
  File "/mnt/c/Dropbox/git/nuad/nuad/search.py", line 933, in search_for_sequences
    eval_set.evaluate_all(design)
  File "/mnt/c/Dropbox/git/nuad/nuad/search.py", line 1612, in evaluate_all
    self.evaluate_constraint(constraint, design, None, None)
  File "/mnt/c/Dropbox/git/nuad/nuad/search.py", line 1732, in evaluate_constraint
    results = constraint.call_evaluate_bulk(parts)  # type: ignore
  File "/mnt/c/Dropbox/git/nuad/nuad/constraints.py", line 4482, in call_evaluate_bulk
    results: List[Result[DesignPart]] = (self.evaluate_bulk)(parts)  # noqa
  File "/mnt/c/Dropbox/git/nuad/nuad/constraints.py", line 5488, in evaluate_bulk
    energies = nv.rna_plex_multiple(sequence_pairs, logger, temperature, parameters_filename)
  File "/mnt/c/Dropbox/git/nuad/nuad/vienna_nupack.py", line 484, in rna_plex_multiple
    raise ValueError(f'lengths do not match: #lines:{len(lines) - 1} #seqpairs:{len(pairs)}')
ValueError: lengths do not match: #lines:21630 #seqpairs:21632

Inspecting the (very large) output, it looks like some lines are missing; note how there is an extra newline inserted:

image

This issue is difficult to pin down since this only appears to happen when there are tens of thousands of pairs of domains, so hard to get a minimal reproducible example.

It is perplexing to me that the output only has lines like above:

target upper bound 11: query lower bound 3 ( 1.19)

target upper bound 3: query lower bound 7 ( 1.21)

target upper bound 8: query lower bound 6 ( 1.50)

whereas if you run it manually from the command line with the same parameters, you see more lines of output (giving sequence lengths, and prompting for the next input):

$ RNAplex -T 37 -f 1

Input two sequences (one line each); @ to quit
....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8
AACTTTACAGG
CAACAGGTTAA
lengths = 31,31
target upper bound 5: query lower bound 5 (-2.10)


Input two sequences (one line each); @ to quit
....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8
ACGT
ACGT
lengths = 24,24
target upper bound 4: query lower bound 1 (-2.70)


Input two sequences (one line each); @ to quit
....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8
@

Though I suppose it is similar to RNAduplex in that regard; its output through the subprocess module also contains less text than what is written to the screen when running manually from the command line.

Yet another reason it would be great if the ViennaRNA package would implement this feature (and its equivalent for RNAplex) so that we can talk to it cleanly from Python instead of going through pipes and the subprocess module: ViennaRNA/ViennaRNA#154