Large character classes combined with {m,n} is very slow and memory-hungry
neongreen opened this issue · 0 comments
neongreen commented
ChrisKuklewicz/regex-tdfa#14, originally reported by @jaspervdj
module Main where
import qualified Text.Regex.TDFA as Tdfa
main :: IO ()
main = do
let pattern = "^[\x0020-\xD7FF]{1,255}$"
input = take 100 $ cycle "abcd"
regex :: Tdfa.Regex
regex = Tdfa.makeRegexOpts Tdfa.defaultCompOpt Tdfa.defaultExecOpt pattern
matches :: [Tdfa.MatchArray]
matches = Tdfa.match regex input
print matches
This takes over 6 seconds on my machine and claims around 3GB(!) in memory. Removing the {m,n}
part and using "^[\x0020-\xD7FF]+$"
combined with an explicit length check (in Haskell) is a workaround.