The json parse doesn't handle verse ranges
Closed this issue · 2 comments
depeele commented
In several versions (e.g. CEV, GNT, MSG, NLT, NRSV), multiple verses are combined into a single "verse".
This is currently not handled by the json parser. Combination verses are skipped by the current processing.
The verse markup looks like:
<div class="li1">
<span class="verse v1 v2 v3 v4" data-usfm="1CH.1.1+1CH.1.2+1CH.1.3+1CH.1.4">
<span class="content">Seth</span>
</span>
</div>
In most versions, the data-usfm
has a single value (e.g. data-usfm='GEN.1.1'
).
It's unclear how this should be handled with our current data format and reference scheme.
What do we use?
- 1CH.001.001-004
- 1CH.001.001+002+003+004
- 1CH.001.001+1CH.001.002+1CH.001.003+1CH.001.004
Note: You can search in the cache for downloaded versions that use this multi-verse format via:
grep -lE '[0-9]+\.[0-9]+\+' cache/*.json
depeele commented
My current thoughts:
- fold all multi-verses into the first (e.g.
1CH.1.1+1CH.1.2+1CH.1.3+1CH.1.4
are all stored in1CH.001.001
); - all but the first verse be stored as a "reference" document with content of
{ $ref: '1CH.001.001'}
; - when looking up verses, after the first query, walk through the results looking for any
$ref
entries. Add these to a secondary query that would then be mixed in to the results;