special chars \d \s \w in regex
garganti opened this issue · 11 comments
It would be very good if brics regex subsystem could support natively special chars \d \s \w. I know that we can substitute those with the corresponding pattern (like [0-9] and so on), but if brics could build the automaton directly, it would be better. Note that these are the most used operators brics is not supporting according to
Carl Chapman and Kathryn T. Stolee. 2016. Exploring regular expression usage and context in Python. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA, 282-293.
We use brics for generating strings using mutation testing: http://cs.unibg.it/mutrex/ and it works great !
Please see the first and the fifth item on the FAQ: http://www.brics.dk/automaton/faq.html
The predefined automata listed on http://www.brics.dk/automaton/doc/dk/brics/automaton/Datatypes.html#get(java.lang.String) should cover most needs.
Is there any guide or example about how to replace for example '\d' match? The doc in the links above is too hard to understand.
I'm not sure what you mean by "replace for example '\d' match", sorry. The regexp syntax is specified here: https://www.brics.dk/automaton/doc/dk/brics/automaton/RegExp.html
I meant how to implement the matching of '\d' with code using automaton, other than replace the '\d' in the regular expression with '[0-9]'.
There is not a one-to-one correspondence between "standard" notation like \d, \w, etc. and the automata provided by default in this package. You can make a RegExp object like new RegExp("bla<integer>bla", RegExp.AUTOMATON)
.
Thanks. I got exception with this code:
new RegExp("aaa<integer>bbb",RegExp.AUTOMATON).toAutomaton();
Exception
java.lang.IllegalArgumentException: 'integer' not found
at dk.brics.automaton.RegExp.toAutomaton(RegExp.java:395)
at dk.brics.automaton.RegExp.findLeaves(RegExp.java:412)
at dk.brics.automaton.RegExp.findLeaves(RegExp.java:409)
at dk.brics.automaton.RegExp.toAutomaton(RegExp.java:331)
at dk.brics.automaton.RegExp.toAutomatonAllowMutate(RegExp.java:308)
at dk.brics.automaton.RegExp.toAutomaton(RegExp.java:227)
What's wrong with that?
Looks like the automaton.jar file (which contains the automata files) is not included in your classpath?
do I need additional dependency in pom.xml other than this?
<dependency>
<groupId>dk.brics</groupId>
<artifactId>automaton</artifactId>
<version>1.12-4</version>
</dependency>
Use this instead:
new RegExp("aaa<integer>bbb").toAutomaton(new DatatypesAutomatonProvider())
yes, it works. Thanks a lot!
I still wish that dk.brics supported \d directly ...