Keep source code information in AST nodes such as lineno, start and end char position.

Question

Keep source code information in AST nodes such as lineno, start and end char position.

ypaliy opened this issue 4 years ago · 5 comments

Hi,

I've noticed that only some of the nodes retain information from the parsing process such as the start and end position of the token, I think its important to have this link between the source code location and AST nodes. It would also be nice to have the line number. Are there any plans to this in the future?

Thank you.

Answer 1 · 2021-01-14T13:40:08.000Z

Hi,

I've added a demo in #13

wdyt?

Answer 2 · 2021-01-14T14:50:58.000Z

Hey, Thanks :)

I don't think it can work like this. Because it will work only with Node that are parsed from only one token.
Take this example :

tree = ast.parse(textwrap.dedent(r'''
    local function sayHello()
        print('hello world !')
    end
'''))
print(ast.to_pretty_str(tree))

Which output:

Chunk: {} 5 keys
  start_char: 54
  stop_char: 56
  line: 4
  body: {} 5 keys
    Block: {} 5 keys
      start_char: 54
      stop_char: 56
      line: 4
      body: [] 1 item
        0: {} 1 key          
          LocalFunction: {} 7 keys
            start_char: 1
            stop_char: 56
            line: 4
            name: {} 5 keys
              Name: {} 5 keys
                start_char: 16
                stop_char: 23
                line: 2
                id: 'sayHello'
            args: [] 0 item
            body: {} 5 keys
              Block: {} 5 keys
                start_char: 52
                stop_char: 56
                line: 3
                body: [] 1 item
                  0: {} 1 key                    
                    Call: {} 6 keys
                      start_char: 52
                      stop_char: 52
                      line: 3
                      func: {} 5 keys
                        Name: {} 5 keys
                          start_char: 31
                          stop_char: 35
                          line: 3
                          id: 'print'
                      args: [] 1 item
                        0: {} 1 key                          
                          String: {} 6 keys
                            start_char: 37
                            stop_char: 51
                            line: 3
                            s: 'hello world !'
                            delimiter: SINGLE_QUOTE

Single token node like String and Name are ok but node like Block are wrong :/

Answer 3 · 2021-10-17T12:00:05.000Z

Hi, I've fixed the problem for nodes that span several tokens, can you please take a look at #16

Answer 4 · 2021-10-22T07:10:22.000Z

I'll take some time to look at it soon

Answer 5 · 2021-11-06T12:38:22.000Z

Thanks it has been merged.
Note: 'lineno' has been renamed to 'line'

class Node:
        """Base class for AST node."""
        comments: Comments
        first_token: Optional[Token]
        last_token: Optional[Token]
        start_char: Optional[int]
        stop_char: Optional[int]
        line: Optional[int]