haskell-hvr/regex-tdfa

Large character classes combined with {m,n} is very slow and memory-hungry

neongreen opened this issue · 0 comments

ChrisKuklewicz/regex-tdfa#14, originally reported by @jaspervdj


module Main where
import qualified Text.Regex.TDFA as Tdfa

main :: IO ()
main = do
    let pattern = "^[\x0020-\xD7FF]{1,255}$"
        input   = take 100 $ cycle "abcd"

        regex :: Tdfa.Regex
        regex = Tdfa.makeRegexOpts Tdfa.defaultCompOpt Tdfa.defaultExecOpt pattern

        matches :: [Tdfa.MatchArray]
        matches = Tdfa.match regex input

    print matches

This takes over 6 seconds on my machine and claims around 3GB(!) in memory. Removing the {m,n} part and using "^[\x0020-\xD7FF]+$" combined with an explicit length check (in Haskell) is a workaround.