oparkins/Ayia

The json parse doesn't handle verse ranges

Closed this issue · 2 comments

In several versions (e.g. CEV, GNT, MSG, NLT, NRSV), multiple verses are combined into a single "verse".

This is currently not handled by the json parser. Combination verses are skipped by the current processing.

The verse markup looks like:

<div class="li1">
  <span class="verse v1 v2 v3 v4" data-usfm="1CH.1.1+1CH.1.2+1CH.1.3+1CH.1.4">
    <span class="content">Seth</span>
  </span>
</div>

In most versions, the data-usfm has a single value (e.g. data-usfm='GEN.1.1').

It's unclear how this should be handled with our current data format and reference scheme.

What do we use?

  • 1CH.001.001-004
  • 1CH.001.001+002+003+004
  • 1CH.001.001+1CH.001.002+1CH.001.003+1CH.001.004

Note: You can search in the cache for downloaded versions that use this multi-verse format via:

grep -lE '[0-9]+\.[0-9]+\+' cache/*.json

My current thoughts:

  • fold all multi-verses into the first (e.g. 1CH.1.1+1CH.1.2+1CH.1.3+1CH.1.4 are all stored in 1CH.001.001);
  • all but the first verse be stored as a "reference" document with content of { $ref: '1CH.001.001'};
  • when looking up verses, after the first query, walk through the results looking for any $ref entries. Add these to a secondary query that would then be mixed in to the results;

Resolved via: