cs-au-dk/dk.brics.automaton

Word boundary not working

vitalyli opened this issue · 2 comments

Hi !
Ran into word boundary issue, I have a long list of tokens, some of which I
need to match with word boundary; Basic Pattern class works, but not Automation.
Any way this can be fixed?

Automaton p0_AA = new RegExp(".*(something|\b(blah|foo|goo)\b)").toAutomaton();
RunAutomaton p0_RA = new RunAutomaton(p0_AA);
System.out.println(p0_RA.run("ba foo nery"));

-->false

Basic regex works with above.

Pattern p0 = Pattern.compile(".*(something|\b(blah|foo|goo)\b)");
String s = "ba foo nery";
Matcher m = p0.matcher(s);
if (m.find()) {
System.out.println("pattern found");
} else {
System.out.println("not found");
}

-->found

Please see the FAQ (https://www.brics.dk/automaton/faq.html).
You can try something like new RegExp("(.*[\\ ])?[a-z]+([\\ ].*)?") (modify according to what delimiters and word characters you're interested in).