jgm/djot.js

should "a" be represented by dj.substring(0, 1) - or (0,0) ?

FrankFischer opened this issue · 3 comments

If you parse the Djot string "a"
the following events will be produced:

[{ startpos: 0, endpos: 0, annot: "+para" }
,{ startpos: 0, endpos: 1, annot: "str" }
,{ startpos: 2, endpos: 2, annot: "-para" }]
  • "+para" 'is' the first char of "ab"
    (same startpos and endpos as 'a')

  • "-para" 'is' the char after "ab"
    (the char at offset 2) - but this
    char does not exist!


Even if this 'works' in an implementation - a more
concise and clearer concept should be considered:

[{ startpos: 0, endpos: 0, annot: "+para" }
,{ startpos: 0, endpos: 2, annot: "str" }
,{ startpos: 2, endpos: 2, annot: "-para" }]
  • startpos would be 'at' the start of the
    first char' (the point before "ab")
  • and endpos at the start of the char following
    the last char that should be included
    (the point after "ab")
  • and "str" would be the chars between this two
    points

As far as i know Java, JavaScript, Scala and
many other programming languages use this
concept.


In my opinion it this might be the better
way in the long run.

Frank

If you parse the Djot string "ab" ...
was what i wanted to say.

jgm commented

(the char at offset 2) - but this
char does not exist!

Well it does: it's a \n (newline) character.

jgm commented

I'm not really sure what is best. In fact, there are three ways we could go:

  1. Current system with the offset of the first character and the offset of the last character
  2. Your way, with the offset of the first character and the offset of the character after the last character
  3. Offset of first character plus length

Since all the code currently implements 1, we'd need strong reasons to change from that.