jgm/djot

Clarification on tab indentation rules

herabit opened this issue · 3 comments

I was hoping for there to be some clarification on the rules for parsing indentation that includes tabs. I am currently (attempting, we'll see if I get far) to implement a parser for tree-sitter, and I was unsure if I should follow what commonmark does, and "treat" a tab as four spaces when handling indentation levels, or if it somehow differs in djot.

jgm commented

Probably we should add something about this. I am somewhat embarrassed to say that djot.js does something fairly crude:

  // move parser position to first nonspace, adjusting indent
  skipSpace(): void {
    const subject = this.subject;
    let newpos = this.pos;
    while (isSpaceOrTab(subject.codePointAt(newpos))) newpos++;
    this.indent = newpos - this.startline;
    this.pos = newpos;
  }

which amounts to a tab stop of 1!

I guess I'd be inclined to use a tab stop of 4 in computing this. (That's a bit different from treating a tab as four spaces, since SPACE + TAB and TAB might both put you in the same column.) The code would have to be modified a bit.

Anyone have feedback on this?

I believe for djot it only matters in the following two cases, and even then only when the actual tab stop is other than what djot assumes AND there is inconsistent use of spaces vs tabs:

( represents a tab char)

  1. multiple levels of nesting

    - parent
      - child
        - grandchild
    →→which list am I nested within?
      →what about me?
    →and me?
    

    A tab stop of 4 would make the last three lines, respectively: great-grandchild, grandchild, grandchild.
    A tab stop of 2: grandchild, grandchild, child.
    A tab stop of 8: great-grandchildren all.

  2. markers with insignificant leading whitespace

      - marker with two leading spaces
    →- nested if tab stop is 4, not nested if 2
    

If tabs are used consistently, either because the writer is disciplined or the editor automatically converts spaces to tabs, it doesn't matter if the tab stop differs from djot's assumption:

- parent
→- child regardless of tab stop value
→→- grandchild regardless of tab stop value
→→→great-grandchild regardless of tab stop value
→→ great-grandchild regardless of tab stop value
→→grandchild regardless of tab stop value
→ grandchild regardless of tab stop value
→child regardless of tab stop value
 child regardless of tab stop value

The last observation suggests an out-of-box idea: interpret a tab to mean "take me to the next level of nesting". It may be a bad idea, but throwing it out for consideration. The motivation is that it might be more resilient than a fixed value of 4? I need to sleep now :)

jgm commented

interpret a tab to mean "take me to the next level of nesting".

That's more or less how it works now. And, as you say, it won't cause a problem if tabs are used consistently and people don't put spaces before tabs. But that in practice people aren't consistent, so I think a tab stop would be less confusing.