Unicode in character class sometimes not working
ukolovda opened this issue · 4 comments
ukolovda commented
I've hot very strange bug with unicode symbols in PCRE character classes.
I wrote small test:
func TestUnicodeAndClass(t *testing.T) {
// Simple unicode works
re := MustCompile(`ййй`, 0)
m := re.NewMatcherString(`ййй`, 0)
if !m.Matches {
t.Error("Failed to find any matches")
}
// But with char class not working...
re = MustCompile(`й[й]й`, 0)
m = re.NewMatcherString(`ййй`, 0)
if !m.Matches {
t.Error("Failed to find any matches")
}
}
(see https://github.com/ukolovda/go-pcre/tree/unicode-class-bug )
When I remove first or last symbol from the pattern, it works.
ukolovda commented
If I set flag UTF8, it works:
func TestUnicodeAndClass(t *testing.T) {
re := MustCompile(`ййй`, 0)
m := re.NewMatcherString(`ййй`, 0)
if !m.Matches {
t.Error("Failed to find any matches")
}
const PCRE_CONFIG_UTF8 int = 0x800
re = MustCompile(`й[й]й`, PCRE_CONFIG_UTF8)
m = re.NewMatcherString(`ййй`, 0)
if !m.Matches {
t.Error("Failed to find any matches")
}
}
I try make PR for this constant in the library.
ukolovda commented
The alternative is change pattern and use CompileParse
function:
re = MustCompileParse(`(?u)й[й]й`)
ukolovda commented
Flag UTF8 already exist, sorry:
re = MustCompile(`й[й]й`, UTF8)
working too.
ukolovda commented
Closing