TimeOut for Tokensregex service
CecileRobin opened this issue · 13 comments
I'm testing the Python code given in the README of the package for the "Annotation Server Usage" - only changed the annotators back to the default ones, on line 9:
with corenlp.CoreNLPClient() as client:
ann = client.annotate(text)
but I'm experiencing a timeout error when the programme reaches the tokensregex step:
python test.py
[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP - Threads: 4
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
[pool-1-thread-3] INFO CoreNLP - [/127.0.0.1:47736] API call w/annotators tokenize,ssplit,pos,depparse,lemma,ner
Chris wrote a simple sentence that he parsed with Stanford CoreNLP.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.8 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 11.114 (s)
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [12.6 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[pool-1-thread-3] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.1 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.7 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.4 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Traceback (most recent call last):
File "test.py", line 32, in <module>
matches = client.tokensregex(text, pattern)
File "/home/cecile/workspace/python-stanford-corenlp/corenlp/client.py", line 205, in tokensregex
matches = self.__regex('/tokensregex', text, pattern, filter)
File "/home/cecile/workspace/python-stanford-corenlp/corenlp/client.py", line 228, in __regex
self.ensure_alive()
File "/home/cecile/workspace/python-stanford-corenlp/corenlp/client.py", line 107, in ensure_alive
raise PermanentlyFailedException("Timed out waiting for service to come alive.")
corenlp.client.PermanentlyFailedException: Timed out waiting for service to come alive.
I see in client.py that there is a hack for this in the definition of __regex, but it doesn't seem to solve the problem.
def __regex(self, path, text, pattern, filter):
"""Send a regex-related request to the CoreNLP server.
:param (str | unicode) path: the path for the regex endpoint
:param text: raw text for the CoreNLPServer to apply the regex
:param (str | unicode) pattern: regex pattern
:param (bool) filter: option to filter sentences that contain matches, if false returns matches
:return: request result
"""
self.ensure_alive()
# HACK: For some stupid reason, CoreNLPServer will timeout if we
# need to annotate something from scratch. So, we need to call
# this to ensure that the _regex call doesn't timeout.
self.annotate(text)
(I also tried by removing the self.annotate(text) but the error is the same)
The previous steps look fine nonetheless:
sentence: token {
word: "Chris"
pos: "NNP"
value: "Chris"
before: ""
after: " "
originalText: "Chris"
ner: "PERSON"
lemma: "Chris"
beginChar: 0
endChar: 5
tokenBeginIndex: 0
tokenEndIndex: 1
hasXmlContext: false
}
token {
word: "wrote"
pos: "VBD"
[...]
Hi @CecileRobin,
I'm quite rusty but I remember encountering the same problem months ago.
Unfortunately, I don't have any ideas for solving this. :(
@CecileRobin the solution should be as simple as increasing the timeout when launching the CoreNLPClient (default timeout is 5000ms):
with corenlp.CoreNLPClient(timeout=10000) as client:
ann = client.annotate(text)
Does that do the trick?
Hi @arunchaganty,
I had tried that already but the error is still there (I even tried with 50000 without more success)...
I see. I'll look into this and get back to you as soon as I can.
-
I found that there is an inner timeout limit of 15s in function CoreNLPClient.ensure_alive, so changing RobustService.TIMEOUT to a bigger value than 15 will have a chance to avoid timeout.
-
It seems that the Java CoreNLP Server is launched without memory configuration
start_cmd = "java -cp '{corenlp_home}/*' edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port {port} -timeout {timeout}".format(
corenlp_home=os.getenv("CORENLP_HOME"),
port=port,
timeout=timeout)
so I add a -mx4g to allocate a 4g memory space like:
start_cmd = "java -cp '{corenlp_home}/*' -mx4g edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port {port} -timeout {timeout}".format(
corenlp_home=os.getenv("CORENLP_HOME"),
port=port,
timeout=timeout)
don't know if these help.
Allocating a 4g memory space for the launch of the Java CoreNLP Server did solve the problem!
Thanks!
I am trying to install and use this lib but get the same error.
Unfortunately, neither the 4g trick nor the timeout solve the issue.
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP - Threads: 2
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
[pool-1-thread-2] INFO CoreNLP - [/127.0.0.1:61586] API call w/annotators tokenize,ssplit
Chris wrote a simple sentence that he parsed with Stanford CoreNLP.
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.protobuf.UnsafeUtil (file:/Users/quentin/code/stanford-corenlp-full-2018-02-27/protobuf.jar) to field java.nio.Buffer.address
WARNING: Please consider reporting this to the maintainers of com.google.protobuf.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
same here - adding -mx4g doesn't solve the problem.
-mx4g in this term 4g is not RAM, it is the heap size that the java jdk can take to during the execution time. You can increase it to 100g even if the server shutdown's while doing this then try either of these
- uninstall java jdk and install a new version iof jdk
- check the available harddisk size
and then increase the tmeout to 75000 or more.
use this command to start the server
java -Xmx10g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 75000 -serverProperties server.properties -preload "tokenize,ssplit,pos,lemma,ner,regexner,tokensregex"
this will take a bit time to start the server,around 30-45 seconds and try to reduce the number of annotators and sentences.
try by increasing the timeout and heap size if u till face errors.
I've this issue when trying to annotate a quite large number of sentences in a loop.
I made a workaround with a try/except that sleeps when the exception is raised.
My code is something like:
while i < len(sentences):
try:
sentences[i] = annotate(sentences[i])
i += 1
except corenlp.client.PermanentlyFailedException:
print("Exception at i=%d, sleeping" % i)
time.sleep(5)
I get a really regular pattern, the exception is raised at:
i=14111i=28222i=42333i=56444i=70555- ...
Reboot the system just solve the problem. It's weird
In general please
- post once
- explain what you're doing & how to reproduce the problem
- start a new issue for a new issue
In general please
- post once
- explain what you're doing & how to reproduce the problem
- start a new issue for a new issue
Sure. Just deleted the duplication.