jgclark/BibleGateway-to-Markdown

Incomplete cross-reference (with possible workaround)

omichaelo opened this issue · 1 comments

When running bg2md.rb -i -v ESV john1, cross-reference beyond Z (AA, AB, etc.) only saves the second letter.

For example, this is the expected result:

And [^Z]the Word [^AA]became flesh and [^AB]dwelt among us

This is the actual result:

And [^Z]the Word [^A]became flesh and [^B]dwelt among us

After some debugging, I found a workaround. Disclaimer: I don't know ruby nor regex and am not a programmer.

passage.gsub!(%r{<sup class='crossreference'.*?See cross-reference (\w+)+.*?</sup>}, '[^\1]')

This is on line 388. Notice the + at (\w+).

This script is amazing, I hope this is helpful!

Thanks for spotting this and suggesting the solution.
It actually is a one-character change: moving from (\w)+ to (\w+). Such is the precision needed in regular expressions, though they remain a fantastic tool.
Fixed, so closing.