Makopo/lslint

Linter stops reporting errors after preprocessor definitions

Closed this issue · 10 comments

I believe that work has been done in the past to continue linting anyway.
This reported all errors before I updated my LSL development environment on Sublime-Text:

#define USE_LAZY_LISTS
integer myNumber = 3245"";
default
{
    state_entry() {
        llSay(PUBLIC_CHANNEL,"Hello, Avatar!" + (string)myNumber);
        llWhisper(abcde,something);asdas
    }
}

Now I get:
image

Line 2 and 7 should throw an error as well.

Thanks. That testcase produces this output for me:

ERROR:: (  1,  9): syntax error, unexpected IDENTIFIER, expecting '('
TOTAL:: Errors: 1  Warnings: 0

I get the same result from commit 2f09e68 which is the original commit (after adding the missing string.h to the headers), therefore this is not a regression.

Passing the option -i (ignore preprocessor directives) to lslint at current master, I get:

ERROR:: (  2, 24): syntax error, unexpected STRING_CONSTANT
ERROR:: (  8,  5): syntax error, unexpected '}'
TOTAL:: Errors: 2  Warnings: 0

That option enables it to progress further, enough for it to find two syntax errors instead of one. The real issue here is the nature of the parser, and the difficulty that parses confront in recovering from syntax errors.

The last syntax error is very unfortunate, because it doesn't let the parser find the closing } and therefore it can't find a complete state. Without at least a complete default state, there is no script, and without a script, no further errors can be analyzed. Changing the order of the lines inside state_entry and using -i allows the compiler to sync on the final semicolon and complete the state, producing this output:

ERROR:: (  2, 24): syntax error, unexpected STRING_CONSTANT
ERROR:: (  6, 19): `abcde' is undeclared.
ERROR:: (  6, 25): `something' is undeclared.
ERROR:: (  7,  9): syntax error, unexpected IDENTIFIER
TOTAL:: Errors: 4  Warnings: 0

Note that it hasn't declared myNumber because it found a syntax error before the declaration was complete, but it skips the myNumber problem in the llSay line because it hasn't parsed it: it has ignored everything up to the semicolon. Adding a semicolon after asdas causes that problem to be reported as well.

Without -i, it seems it has even more serious problems resuming the parsing when it finds the directive.

In short, I see no bug or regression here, just that the parser's error recovery could be improved. I'm not familiar with bison error recovery, so unfortunately I can't help improving the situation.

Ah, right. I must have added -i locally and forgot to commit, then! 😨

Maybe appending a Try running "lslint -i ..." to the output would make sense?

@XenHat You can use the lsl marker to make Github highlight the code block as LSL.

```lsl
// ...
```

...which reminds me that I should send them an update...

#61 improves error recovery in the above test case. Now the syntax error at the end can be recovered without losing the script.

EDIT:
Now the first error can be recovered even when option -i is not in effect. However, it's still an unfortunate situation. Without -i, the # is ignored, and the newline counts as a space. So what the parser sees is:

IDENTIFIER IDENTIFIER integer IDENTIFIER '=' INTEGER_CONSTANT STRING_CONSTANT ';'

I don't see any way to force it to recognize the type after the identifiers; there's no clear boundary to mark where the error ends and parsing can be resumed. So I've added the rule:

globals: error ';' ;

which can at least resume after the ; and parse the next variable or function (if any - there's none in the example above). So I think that's the best that can be done. It still reports myNumber as undeclared, but at least it reports the rest of errors.

I have not looked at the code for a while now but would it make sense and be helpful in the long run to investigate the difference between the code path with -iand the one without, to try to merge them when it makes sense and untangle this kind of issues?

I'm not sure what you mean. -i is interpreted at the lexical analysis level; it just removes lines with preprocessor directives so that they never get to the syntax analyzer (which makes sense since they are lines intended for pre-processing). People who don't use a preprocessor at all may want to use # characters as markers of some kind, as they are just ignored by the parser, like $ or ? or ' or ` or \ or any character > 126. So I think the -i option is necessary.

Sorry for the lack of clarity earlier, I was on my phone.

I typically see these kind of different behavior when two blocks of code perform the same action in different ways, with duplicated code. I was wondering if that was the case and if the code suffered from maintainability issues due to this.
I.e:

// Version A
myFunction(var input)
{
    if (OptionA)
    {
        doSomething(input);
        OptionAOnlyCall();
        doSomethingElse(OptionA);
    }
    else
    {
        doSomething(input);
        // Not doing anything here, only required for OptionA
        doSomethingElse(OptionA);
    }
}

as opposed to

// Version B
myFunction(var input)
{
    doSomething(input);
    // Now the code above is the same for all cases, and specific behavior
    // for OptionA is handled below
    if (OptionA)
    {
        OptionAOnlyCall();
    }
    // Then we resume with global behavior code
    doSomethingElse(OptionA);
}

Version A is easier to read, but can cause maintainability issues after hundreds of changes.

I was wondering if handling -i properly was difficult because of that kind of issue, which could be simplified with an analogous design to Version B.

I'm still not sure if I understand, so I'll explain how it works. Option -i is implemented in the lexical analyzer, by making C preprocessor commands behave the same as comments, see

lslint/lslmini.l

Lines 159 to 161 in c1a3baa

^[ \t\f\v\r]*"#" { LLOC_STEP(); if (skip_preproc) { BEGIN PREPROC; } }
<PREPROC>[^\n]*\\\r?\n { LLOC_LINES(1); LLOC_STEP(); }
<PREPROC>[^\n]*\n { BEGIN 0; LLOC_LINES(1); LLOC_STEP(); }

Compare to lines 46, 52 and 53 (the block at lines 47-51 is a lslint-specific extra for handling $[Exxxxx] assertions). That makes the preprocessor lines not generate any tokens for the syntax analyzer at all. That is, with -i active, this script:

#define x 3
default{timer(){}}

is sent to the syntax analyzer as this sequence of tokens:

STATE_DEFAULT '{' IDENTIFIER '(' ')' '{' '}' '}' (EOF)

but with -i- (or without -i), the "preprocessor-acts-like-a-comment" code is not activated, making the sequence of tokens be this one instead:

IDENTIFIER IDENTIFIER INTEGER_CONSTANT STATE_DEFAULT '{' IDENTIFIER '(' ')' '{' '}' '}' (EOF)

The first identifier is the word "define"; the second identifier is the word "x"; the integer constant is 3.

I hope that that answers your question.

I understand how this works now, I think.

I didn't mean that the -i was to be removed, I guess I simply misunderstood the difficulty of the code.

BISON doesn't even appear to support the complexity that would induce the problem I mentioned. haha..