match_group_indexes blank array for plain text regex.
JernKunpittaya opened this issue · 4 comments
With regex /merg(e)/ as plain text trying to match the "merge" in any text, just results in blank match_group_indexes, hence error. Quick fix can be done via writing regex as /merg(e|a)/ , aka just put that e in the group with some other operations, and this result in matching "e" value.
Thanks for reporting.
This is indeed a valid issue. Currently, the regex compiler assumes at least one alternate group presented in the regex string, and use one of the group to calculate related data points for the output signals.
It is possible to support searches without alternate groups and only provide the data points, such as the start index of the match and how many entire matches. This is a potential feature to support.
For now, I think we can do a check and output proper error messages to avoid confusion, as you have helped point out. @Divide-By-0 may have other thoughts on this.
We could also potentially just make the final state right before the accept stage of the DAG be the group that is indexed over?
We could also potentially just make the final state right before the accept stage of the DAG be the group that is indexed over?
That would also be a way to go.
I think it is about structuring the code to allow opt-in for the features that match_group_indexes
serves.
The current code that checks if there are >= two states within a match group via two incoming nodes, is brittle and seems to bug on other use cases, and also doesn't quite seem like the thing we always want to match. @JernKunpittaya has some ideas on improving this and will try to setup a way for a user to specify this.