Makopo/lslint

Skip preprocessor directives?

Closed this issue · 20 comments

This is for discussion of a feature request related to Makopo/sublime-text-lsl#36, suggested in #6.

There isn't a lot that can be done about the preprocessor, without a preprocessor. Introducing its semantics into the grammar is extremely difficult, if feasible at all. A preprocessor is a non trivial piece of software, and its grammar has some differences with respect to that of C (and of LSL, of course). The mix is a strange beast that perhaps could be parsed as a unit (I believe GCC does that) but that is MUCH better parsed as it is intended to be, namely in two stages: first the preprocessor, then the LSL grammar.

Now, it would be possible to pipe the program to a preprocessor, and the output of the preprocessor to lslint. My optimizer does something similar, namely calling an external preprocessor before the parsing stage begins.

But then there's a problem. Typically, preprocessors insert #line directives or similar (FS's built-in one also does, but FS comments these out). So, either the output of the preprocessor is re-preprocessed to comment out any line that starts with # (after spaces, possibly), or lslint itself ignores the directives.

Long ago I wrote a patch to do the latter, that I reproduce below:

commit cd38666f09da700cbbcbbb56ee6e665255a0f2d1
Author: Sei Lisa <seilisasl@gmail.com>
Date:   Fri Mar 6 05:24:33 2015 +0100

    Skip preprocessor directives (normally line number trackers)

diff --git a/lslmini.l b/lslmini.l
index 905b8cd..ff0d7eb 100644
--- a/lslmini.l
+++ b/lslmini.l
@@ -35,6 +35,8 @@ FS			(f|F)
  /* exclusive state to eat comments of any length without overflowing any buffers */
 %x COMMENT
 %x C_COMMENT
+ /* exclusive state to skip preprocessor commands */
+%x PPMULTILINE
 
 %%
 %{
@@ -142,6 +144,10 @@ L\"(\\.|[^\\"])*\"	{
 "<<"				{ return(SHIFT_LEFT);  }
 ">>"				{ return(SHIFT_RIGHT); }
 
+^[ \t\f\v\r]*"#"[ \t\f\v\r]*[a-zA-Z0-9]+.*\\\r?\n	{ BEGIN PPMULTILINE; LLOC_LINES(1); LLOC_STEP(); }
+<PPMULTILINE>[^\n]*\\\r?\n	{ LLOC_LINES(1); LLOC_STEP(); }
+<PPMULTILINE>[^\n]*\n	{ BEGIN 0; LLOC_LINES(1); LLOC_STEP(); }
+^[ \t\f\v\r]*"#"[ \t\f\v\r]*[a-zA-Z0-9]+.*\n	{ LLOC_LINES(1); LLOC_STEP(); }
 
 \n                  { LLOC_LINES(1); LLOC_STEP(); } 
 .					{ LLOC_STEP(); /* ignore bad characters */ }

If that's an acceptable solution, feel free to include it in master. Note, however, that any external C preprocessor will have a problem with one "feature" of LSL, namely multi-line strings (FS "pre-preprocesses" the preprocessor input so that such strings are made single-line, and my optimizer does the same).

One problem with this approach is that lslint will only see the preprocessor's output, which is not immediately available to the user. FS makes it available in a separate tab in which errors are displayed, leaving it up to the user to relate them to a corresponding original source line. My optimizer has the option to display the preprocessor's output, for pretty much the same reason.

So, it's not so easy to allow a preprocessor run, from the point of view of how to display meaningful errors in editors like Sublime. Any thoughts on how this should be treated?

The main issue I am facing right now concerning the preprocessor directives (#include, #if/#elif/#else/#endif, #define, #warning and #error) is that they stop linting entirely due to not being actual LSL syntax(which is correct).

I do not desire to integrate with a preprocessor, I simply want lslint to be able to skip those directives.

The patch above seems to work as intended. Just need to keep in mind that preprocessor shenanigans such as "Lazy lists" won't validate.
ie:

string cmd = (string)cmdlist[0]; // LAZY_LIST for string cmd = llList2String(cmdlist,0);

This isn't a problem for me and I definitely cannot write BISON so I'm accepting that caveat.

Simply skipping them is only a half solution that helps in a limited number of situations. Most uses of the preprocessor will make lslint complain, even with the patch. Making use of a #define macro, for example:

#define CHANNEL 3
default
{
    touch_start(integer n)
    {
        llSay(CHANNEL, "TOGGLE LIGHT");
    }
}

will make lslint complain that CHANNEL is not defined.

LSL-PyOptimizer isn't a lint, in that it doesn't issue warnings for unused variables or dead code, it just removes them, but it checks syntax and handles preprocessor, lazy lists, switch and other syntax extensions. I use it even in simple projects as an offline syntax checker. Maybe that's closer to what you need?

I think the complete solution would be making a command-line LSL preprocessor, or request Phoenix project to do, and users run it (by command or editor plugins) to convert their scripts to standard LSL one before calling this lslint.

Meanwhile, I think that simply removing preprocessor command lines would be enough, that requires user tolerance against syntax errors or so caused by removing controls commands though.

The built-in preprocessor in Firestorm is Boost Wave, a generic C preprocessor. There are already command-line C preprocessors, so that's not an issue.

For instance, my optimizer doesn't preprocess, it calls an external program for that purpose. I normally recommend mcpp, because it's a stand-alone executable and is easy to install, unless you already have GCC installed.

Note, however, that the output of a preprocessor often includes #line <N> <filename> directives. That's the main reason I wrote the above patch: so that I could preprocess the file and have lslint skip over the #line directives.

[e-mail response failed again]
The full resolution would indeed to implement a standalone
preprocessor, but that isn't what I wanted. I simply wanted to continue
using lslint and have it lint what it could and ignore what is
obviously not LSL. This would already be a better behavior than right
now, which is to entirely stop. I'm already using the patch(es) above
daily and they satisfy me for now.

P.S. Porting the preprocessor to a standalone application could be done indeed, but that's not something I want to work on right now.

I think I'm not getting what you mean by standalone application that mcpp or gcc -E doesn't already cover.

@Sei-Lisa

Note, however, that the output of a preprocessor often includes #line directives.

If you use mcpp, I believe mcpp -P will not print out #line message.

Yeah, I meant "Lazy Lists" or so.

I meant the process of writing an implementation of the LSL preprocessor that doesn't require the viewer to run. it's not /only/ boost wave, although that's the library behind the heavy lifting. I believe it used to be based on BISON. Certainly isn't anymore.

EDIT: @Makopo is right, lazy lists and additional preprocessor features beyond the simple include/ifelse processing require boost and a few definitions.. I recently ported the preprocessor to my personal viewer project and I'll have to say that the code is ugly, hacky, and needs a lot of love. It has several security and memory management problems. I really don't want to port the current implementation to something standalone.

Ah, my optimizer processes lazy lists and switch statements. Optimization can be disabled completely, but constants are substituted (there's Sei-Lisa/LSL-PyOptimizer#4 requesting them to stay unchanged). Is that what you mean?

Regarding the patch, I would prefer if it was made optional, but I haven't yet figured out how.

I plan creating another mode for Firestorm, which enables enhanced syntax highlight for it. I wonder which way to behave when F7 build:

a) Full preprocess the file(s), then display it on another editor tab, and then run lslint to check it as normal LSL script.

b) Partial preprocess (only command directives which triggers any change in line number) the file(s), then display it on another editor tab, and then run lslint with Firestorm-check mode to check it as enhanced(including lazy list and switch) LSL script.

I haven't looked into current implementation of Firestorm's preprocessor. Thank you @XenHat for sharing information. I tried @Sei-Lisa provided LSL-PyOptimizer. When I ran it on the script with multiple errors, it only reported first error and stopped processing.

Regarding the patch suggestion, I confirmed the directive lines were successfully ignored. I prefer it is introduced as optional, just as "ignore directives" or so, not specifically related to Firestorm dialect.

I've managed to make this feature optional and will send a pull request soon. I've also made some progress towards a lslint implementation of switch statements and lazy list syntax.

While working on it, I realized that having a command line option to enable and another command line option to disable a feature can make us run out of letters soon. So I've thought about making the following change:

  • -m enables mono; -m- disables mono (instead of -M),
  • -i ignores preprocessor directives; -i- makes # non-special.
  • -z enables lazy lists; -z- disables lazy lists.
  • -w enables switch statements; -w- disables switch statements.

Questions:

  1. Is this change acceptable?
  2. Are there other preferred variations alternative to adding a - after the letter?
  3. (depending on 1) Should we keep -M in particular now that it is released, or do you think it can be ditched because it is a young feature and it will not be in wide use yet?
  4. Are these letters OK for the corresponding features, keeping in mind the switches that are already taken? (if it is implemented, we can also use upper-case letters)

Note: I believe that having separate options to enable and to disable a feature is necessary. That way, you can for example make an alias to adapt lslint to your needs, e.g. alias lslint='lslint -i -z -w' and you can override it later with e.g. lslint -z-, which will expand to lslint -i -z -w -z- and will result in disabling lazy lists only.

1: Sounds good to me.
2: that's fairly standard, - works.
3: I'd say, remove M since it's young. Maintainers gotta do their job.
4: The letter attribution looks good to me.

RE:Note: Good point. Overriding by last-occurence is a good standard and is intuitive.

  1. -m and -m-, -i and -i- is ok. For dialect expressions, how about adding -O (or -F) followed by each elements(LazyList, Switch, or so) like your optimizer?
  2. I have no idea about this. Some existing switch (-S) seems to comply to the IEEE rule, but some don't.
  3. I believe it would be now time to remove. Or, have LSO option rather than default mono mode instead?
  4. Attribution itself would be okay.

@Makopo

About 1: So far, the command line parser is very primitive. Python has a built-in getopt parsing library and powerful string functions; C fails in both respects. Parsing something like -O in the same way as the optimizer would need some work, which I'm not sure is worth the effort.

About 3: I much prefer Mono, for two reasons. First and most important, it matches the checkbox in the viewer. Second, people are not familiar with LSO and I often see people asking in groups what it is. Having a Mono option like the viewer does, holds their hand in that respect.

Thank you for explanations. It make sense to me.

Thanks for the merge. This is the current WIP for switch statement support: https://github.com/Sei-Lisa/lslint/tree/sei-syntax-extensions (rebased on current master). It currently supports the syntax just fine, but it doesn't introduce it into the tree yet, so at the moment it can't soon it will do further analysis to emit more warnings. I'd like to warn when there's no default, multiple defaults, identical case labels, or statements between break and the next case label. EDIT: And incomparable types between the switch expression and the case label. EDIT2: Updated: the tree is generated now.

Should we keep this issue open to discuss these features, or do we open a new one?

I personally tend to compartmentalize topics, up to @Makopo

@Sei-Lisa I'd go with open the new one. Thanks.

Ok, then I guess this one is already closed.