bijington/expressive

Separate compilation options from evaluation options

Closed this issue · 10 comments

Currently ExpressiveOptions.IgnoreCase has quite an effect on results of an expression:

  1. Allows for loose matching of functions/operators (If/if, or/Or, etc.)
  2. Allows for functions to use this setting when evaluating.

The critical one which I believe is completely wrong is being used when evaluating a result of a operator (e.g. ==)

Currently new Expression(" 'abc' == 'ABC' ") will give a different result based on whether ExpressiveOptions.IgnoreCase is set or not. I believe this is a fundamental mistake and needs to be corrected.

This will result in a breaking change to the framework.

Main areas to investigate appear to be:

Usages of Context.StringComparison

Helpers.Comparison lines 27 and 127
ContainsFunction, EndsWithFunction, StartsWithFunction

Use cases for separating IgnoreCase scope

Being able to choose 'ignore case on parse' is really useful so you can do loose matching for functions and operators:

  • fewer typo-driven parsing errors;
  • within-code expression strings can conform to an org's coding standards and conventions;
  • easier to use Expressive if application users are supplying the expressions and they conform to the application's grammar spec or convention, e.g. Excel-style ROUND() vs. Python- or R-style round().

Being able to choose 'ignore case on equality' is also important, and is application-dependent. For example:

  • Depending on the source plus how you retrieve them, GUIDs can have upper-case or lower-case hex digits, so IgnoreCase is good.
  • If you are comparing book titles or person names coming from different sources, IgnoreCase is good.
  • If you are comparing code symbols like variable names, IgnoreCase is bad.

And other string-based methods like Contains, StartsWith, and EndsWith could need a different IgnoreCase choice than the context-wide 'ignore case on equality':

  • Ignore case on equality so GUIDs match, but don't ignore case on Contains when looking for a model number in a product description string.

Regardless of how IgnoreCase is ultimately handled in the Context, I think having a boolean IgnoreCase parameter in those string functions will help match a variety of use cases. (discussed over in #86).

Separating IgnoreCase without breaking code

IgnoreCase is probably used heavily in people's code, so breaking changes are best avoided! I think I see a way to separate IgnoreCase into two different context options for new code while retaining the use and behaviour of IgnoreCase in legacy code, just by tweaking the ExpressiveOptions enum.

Currently, the IgnoreCase behaviour is IgnoreCaseForParsing + IgnoreCaseForEquality.

Currently the enum is structured this way:

  • None = 1
  • IgnoreCase = 2
  • NoCache = 4
  • RoundAwayFromZero = 8

I think you can thread the needle by revising the enum this way:

  • None = 1
  • NoCache = 4
  • RoundAwayFromZero = 8
  • IgnoreCaseForParsing = 16
  • InoreCaseForEquality = 32
  • IgnoreCase = 48

In the Context.cs code, for example, Options.HasFlag(ExpressiveOptions.IgnoreCaseForParsing) will return true for both ExpressiveOptions.IgnoreCaseForParsing and for ExpressiveOptions.IgnoreCase.

Effectively, ExpressiveOptions.IgnoreCase = ExpressiveOptions.IgnoreCaseForParsing & ExpressiveOptions.IgnoreCaseForEquality so no legacy code breaks.

Anyone who has followed sound development practice using the enum field names would see no change in their legacy code behaviour, and new code that takes advantage of the scope split can be blended with legacy code. The only people who would experience a breaking change would be those who bypassed the enum and played code tricks using the integer values instead of using the enum field names, so any break would be a result of deliberately waiving the design benefits of the enum.

Wow! Thank you for this, you have clearly put some effort in to this and clearly considered more use cases than I had.

Initially my plan was to simply remove the ability to have any form of equality operators making use of IgnoreCase. Then implement things like an Equals function that could provide that more explicit case insensitive equality checks however it would clearly be a breaking change.

I really like this approach to prevent the risk of a breaking change! I agree with your assessment regarding using the enum vs an int. The only possibly valid scenario I can imagine is if a user serialised out the state somehow in case they conditionally change that option. Although I can probably live with breaking that as I can't imagine it is very likely.

I have been thinking about this more and I am leaning towards leaving IgnoreCase in for the next release but mark it as 'Obsolete' with the message to explain that it will be removed in future releases.

Then I really like your idea of splitting 'IgnoreCase' in to 2 parts:

  • 'IgnoreCaseForParsing = 16;'
  • 'IgnoreCaseForEquality = 32;'
  • 'IgnoreCaseAll = IgnoreCaseForParsing | IgnoreCaseForEquality;'

That way it should hopefully make it clear what will be happening and why.

Thanks. Although I keep trying to think on a way to solve the potentially different behaviour between an operator and a function e.g. == being controlled by an option but Contains being controlled by a parameter in. Part of me did originally like the idea of the breaking change but then I hadn't considered case insensitive comparisons as a genuine use case.

Perhaps the difference is ok and I am just overthinking it...

I think your approach with the slight modification still holds a lot of merit and is probably the correct one.

I guess IgnoreCaseForEquality will ultimately provide the defaults for things like Contains, etc. but could be overridden like you suggested with the extra parameter.

Cool I think this is likely to be in a position to be ready for implementation. It would be nice to at least get the Obsolete change in to make it clear that future versions will drop the old IgnoreCase value.

Fixed and shipped in release v2.3.0