InseeFr/Trevas

Regex operations

Opened this issue · 4 comments

@noahboerger you reported:

  • Different pattern syntax
  • Trevas is using the Java pattern syntax

Could you precise please?

The testcase this note is raised from is from the BdI testcases the one under "string/pattern_replacement_3".

There the replace function is called with the pattern [a-e-i-o-u] but wanting to only replace the letters a, e, i, o, u.
This pattern seems to be weird out of my point of view so i transformed it to the pattern [a|e|i|o|u] to get the expected result.

It was more a note on my side, that maybe the engine of BdI and Trevas may be using a different pattern syntax or something is wrong with this testcase itself. Nothing that should be adjusted in Trevas.

So i would propose to close this issue.

What does the spec says about the regexp syntax?

The reference manual of match_characters provides the following information (p. 116):

match_characters returns TRUE if op matches the regular expression regexp, FALSE otherwise. The string regexp is an Extended Regular Expression as described in the POSIX standard. Different implementations of VTL may implement different versions of the POSIX standard therefore it is possible that match_characters may behave in slightly different ways.

for replace no explicit reference to a pattern standard seems to be made and also the examples are only containing simple string values.

It's a problem.

I opened an issue in the TF repo