Slow lemmatization on Windows
QtRoS opened this issue · 3 comments
This code in Jupyter notebook:
%%time
from pymystem3 import Mystem
m = Mystem()
text = "Красивая мама красиво мыла раму"
for i in range(10):
lemmas = m.lemmatize(text)
Outputs Wall time: 9.27 s
Every iteration takes almost 1 second. This is dramatically different from results I observe in Ubuntu.
My configuration: Windows 7 x64, i7 4770 3.6Ghz, 16GB RAM
Hello,
This seems to be related to #11. Unfortunately, I am not sure how to make it faster on Windows due to lack of knowledge of this platform. Any patches that would improve it are welcome.
Well, I found a bad hack solution for increasing performance for lemmatize long texts. In file mystem.py we have function
def analyze(self, text):
"""
Make morphology analysis for a text.
:param text: text to analyze
:type text: str
:returns: result of morphology analysis.
:rtype: dict
"""
result = []
for line in text.splitlines():
try:
result.extend(self._analyze_impl(line))
except broken_pipe:
self.close()
self.start()
result.extend(self._analyze_impl(line))
return result
We can change this one for not _PIPELINE_MODE:
if not _PIPELINE_MODE:
def analyze(self, text):
"""
Make morphology analysis for a text.
:param text: text to analyze
:type text: str
:returns: result of morphology analysis.
:rtype: dict
"""
result = []
span = 2000
lines = text.splitlines()
lines = [" ".join(lines[i:i+span]) for i in range(0, len(lines), span)]
for line in lines:
try:
result.extend(self._analyze_impl(line))
except broken_pipe:
self.close()
self.start()
result.extend(self._analyze_impl(line))
return result
This changes dramatically increase performance in Windows. Unfortunately I don't know if it's fine to send such long "lines" to Mystem. But it works for me.
Hi, is this library still being maintained? I think that the problem of slowing down in Windows is very critical, why not to apply, for example, the solution of @qiray ?