Use Spell to parser HDFS have some question
Cafitum opened this issue · 1 comments
Cafitum commented
in line 346
splitter = re.sub(" +", "\s+", splitters[k])
should be changed splitter = re.sub(" +", "\\s+", splitters[k])
add two \ before \s
I don't know why? but changed it then worked
but parameter can't get
in line 264
re.split(r"[\s=:,]", self.preprocess(line["Content"])),
I change the pattern to r"[\s=,]"
then it can work
I think some HDFS log should be split by :
zhujiem commented
the re and regex lib is not stable and may only work at the given version: regex==2022.3.2