The json parse doesn't handle verse ranges

Question

The json parse doesn't handle verse ranges

Closed this issue 8 months ago · 2 comments

In several versions (e.g. CEV, GNT, MSG, NLT, NRSV), multiple verses are combined into a single "verse".

This is currently not handled by the json parser. Combination verses are skipped by the current processing.

The verse markup looks like:

<div class="li1">
  <span class="verse v1 v2 v3 v4" data-usfm="1CH.1.1+1CH.1.2+1CH.1.3+1CH.1.4">
    <span class="content">Seth</span>
  </span>
</div>

In most versions, the data-usfm has a single value (e.g. data-usfm='GEN.1.1').

It's unclear how this should be handled with our current data format and reference scheme.

What do we use?

1CH.001.001-004
1CH.001.001+002+003+004
1CH.001.001+1CH.001.002+1CH.001.003+1CH.001.004

Note: You can search in the cache for downloaded versions that use this multi-verse format via:

grep -lE '[0-9]+\.[0-9]+\+' cache/*.json

Answer 1 · 2024-02-16T13:52:15.000Z

My current thoughts:

fold all multi-verses into the first (e.g. 1CH.1.1+1CH.1.2+1CH.1.3+1CH.1.4 are all stored in 1CH.001.001);
all but the first verse be stored as a "reference" document with content of { $ref: '1CH.001.001'};
when looking up verses, after the first query, walk through the results looking for any $ref entries. Add these to a secondary query that would then be mixed in to the results;

Answer 2 · 2024-02-18T17:05:06.000Z

Resolved via: