bijington/expressive

Backslash at the end of the string results in MissingTokenException

Closed this issue · 10 comments

I think I've run into a bug with string expressions. If a string ends with a backslash, then Evaluate() and ReferencedVariables throw a Missing Token Exception (Missing Closing Token)

For example:

string expr = @"""a\"" " ;

or

string expr = "\"a\" + [varname] +\"b\\\"";

Either of the following will throw the exception:

var evaluatedExp = exp.Evaluate(); var variables = exp.ReferencedVariables;

If I put a character after the \\ then it evaluates properly, so it really seems like an issue with a backslash ending the string.

Thanks,
Michel

Hey sorry for the delayed response I was in the middle of upgrading my laptop and then resulting development environments.

I am having a hard time working out a few things; firstly if this is a bug (I don't think it is but hopefully we can work through the detail and decide if it is or not) and secondly what your expressions are trying to produce as a result.

To hopefully explain why I don't think it is a bug; when the parser detects a " character it will then search for an ending " character but a \ character is an escape character which allows you to include quotes inside of quotes.

To cover your examples:

string expr = @"""a\"" " ; is actually the value "a\" as a string. This shows that there is no closing " character because the trailing one is escaped.

string expr = "\"a\" + [varname] +\"b\\\""; is actually the value "a" + [varname] +"b\" as a string. This matches the above.

Thanks

Shaun

Thanks and, no worries, I was off for three days. :)

Essentially I'm giving Expressive a string expression that contains another string expression, which comes from user input. The "string within the string" is used to generate some C# code that is then compiled on-the-fly with Roslyn and executed on input data.

So, for example, a user may want to input a condition that states that a name should not contain a backlash. Since we allow them to input "complex" expressions, they would indicate that by entering "\" as the value for their condition.

From a cursory look at the Expressive parser code yesterday, I noticed that it assumes that, when it hits a quote character, if the previous character is a backslash, then it is part of an escape sequence. However, if the expression is "\", the last backslash is not the escape character, but rather the "escaped" character.

Not sure if I'm making any sense here. :)

It sounds like an interesting problem you are trying to solve.

I think you are making sense but just to clarify; you are essentially wanting to escape the \ (backslash) character with "s (quotes). Now Expressive currently uses the \ (backslash) character to escape "s (quotes) so we do have a slight conflict there.

One very important question and I don't quite know the answer to this yet is can Expressive do things differently to cater for your scenario. I will think on this (and happy for suggestions for possible ways around it, or even happy to bounce ideas around :) ) to see if we can resolve your issue. It is in an area that I want to improve on (#32 is on the schedule as it has become a bit of a beast and is in need of a refactoring).

I do suspect there are improvements that could be made around the checking for quotes that will likely help us hear.

I've sort of got it to work using a technique I used for handling \ and " in our expression builder (an input field that enforces some "rules" so that expressions are inputted correctly). However, that broke one of the unit tests (ShouldEscapeCharacters) and I haven't figured this one out yet.

Basically what I'm doing is that when I first find a backslash, I have a flag (inEscapeSequence) that I set to true, and then I set it to false again once the next character is processed. This way, something like "\" is processed correctly since, unlike the current code, I don't do anything based on whether or not the character preceding the ending quote is a backslash. When I hit the quote character, if inEscapeSequence is true, then I know it's just an escaped quote that's part of the string. If inEscapeSequence is false, then I know that I found my closing quote.

In its current state, it doesn't like this test and a couple of others: Assert.AreEqual("'hello'", new Expression(@"'\'hello\''").Evaluate());

Since current rules at work make it difficult to contribute code back to open source projects (they're working on making that easier for us, but I'll be off this project before I get an OK), it's really something I need to play with on my own time at home. Unless, of course, what I said above makes sense and you can get it to work yourself. :)

Thanks,
Michel

I think I understand the approach that you have taken and also why it could potentially break those unit tests.

While it may be difficult to contribute would it be acceptable to share (in comments or email) the changes you have made in the Expressive code base? I’ll be happy to see if it can sorted using your approach

I was hoping to just redo it at home last weekend but... Too tired. :) Here's the gist of it.

In the GetString() method I have a bool named inEscapeSequence that I initially set to false.

In the while loop, if the character is '\\' and inEscapeSequence is false, then I set it to true.

If inEscapeSequence is true and the character is an escapable character (e.g., , quote, n), then I set inEscapeSequence is false.

In the if that sets foundEndingQuote to true, instead of looking at previousCharacter, I check that inEscapeSequence is false. If it is, then that means it is not an escaped quote character, so we know we've found our ending quote.

I think that's pretty much it in its current state. With this, my use case works but, as mentioned above, it kills a unit test.

Thanks,
Michel

@MichelR666 apologies for the delay but things have been rather hectic here. While I think there is an issue around the escape characters (multiple backslashes) I don't believe your exact examples are in fact defects. I will try to explain.

string expr = @"""a\"" " ;

Here you have the contents (inside the string):
"a\"
When Expressive tokenises these it checks for the \ character as an escape character. What you can see from the contents of your string above is that you have a starting " but the trailing " is escaped with the preceding \character. Therefore the correct expression here would be:
string expr = @"""a\\"" " ;

So while the above expression should be the correct option that is the issue that I am trying to solve. In fact I have some initial code that mostly solves the problem but it isn't clean and it doesn't truly escape the \ characters so \\ still comes out as \\ whereas it should result in a single \ character for each double sequence.

Does that make sense?

Can I ask how much of an issue is this right now for you? I only ask because I would ideally like to refactor the parsing code but that will take more effort and therefore longer however if it is a big issue then I am happy to proceed with a bug fix and a point release on the framework.

Hi,

No worries - I was on staycation for a few days. Yes, I think you are right. I think I just pasted the wrong thing originally. It's easy to get confused with what I call, "multiple levels of escapes". :) I don't see it as much as a defect as just a case that wasn't handled. Everything worked fine for us for a while until someone came up with a special case (a string ending with a backslash).

I'm no longer on the project (this is my first day on a new project), however the users and remaining developers are aware of the issue and, since it's a very rare edge case, they are not particularly worried about it for now. I think it can wait until the refactoring work is complete. I've also documented it in a ticket in case they don't run into the issue for a long time and don't bother updating the NuGet package in the solution, so they'll know what to look for.

This is a very cool library. No doubt I'll be using it again in the future!

Thanks!
Michel

@MichelR666 that is precisely the answer I was hoping for :).

All the best on the new project and thank you for using Expressive.

Just for the record I shall plan to resolve this issue as part of #32.

@magicops this is now resolved as part of your pull request (#50). Thank you for taking the effort to provide it.

I am closing this now as there will be a release of 1.4.2 happening shortly.