TypeError on coreferences = output['corefs']
diegoje opened this issue · 4 comments
Hey Charles!
I tried this script and I get the following error:
Traceback (most recent call last):
File "/private/tmp/DeepLearning/parse_doc.py", line 175, in <module>
get_sentiment(text, network)
File "/private/tmp/DeepLearning/parse_doc.py", line 154, in get_sentiment
contexts = parse_doc(document)
File "/private/tmp/DeepLearning/parse_doc.py", line 127, in parse_doc
tree = parse_sentence(coreference_resolution(sentence))
File "/private/tmp/DeepLearning/parse_doc.py", line 57, in coreference_resolution
coreferences = output['corefs']
TypeError: string indices must be integers
Process finished with exit code 1
I have pycorenlp==0.3.0
(don't know if that's the one you used) and running the same StanfordNLP as you (http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip) on Java 9.0.1
And here's what I get from StanfordNLP:
[pool-1-thread-1] INFO CoreNLP - [/0:0:0:0:0:0:0:1:50375] API call w/annotators tokenize,ssplit,pos,lemma,ner,depparse,mention,coref
Jean is really sad, but Adam is the happiest guy ever
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.0 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.8 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.3 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:40)
at edu.stanford.nlp.time.TimeExpressionExtractorFactory.create(TimeExpressionExtractorFactory.java:57)
at edu.stanford.nlp.time.TimeExpressionExtractorFactory.createExtractor(TimeExpressionExtractorFactory.java:38)
at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.<init>(NumberSequenceClassifier.java:86)
at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:136)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:121)
at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:273)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:152)
...and a bunch more lines
Is it a problem with JollyDayHoliday
? What do you think I could change to make it work?
Disclaimer: sorry but I don't have any experience with StanfordNLP
Hey Diego! I tested the script and it works fine for me, I added a requirements.txt file maybe try checking that out!
Thanks for this, Charles.
I did some research and apparently I'm not the only one with this issue.
It looks like it is due to the Java version. I'm using Java 9.0.1 - which one are you using?
UPDATE: issue fixed when using Java 1.8!
Hi Charles,
After the above fix, I ran into an IndexError: list index out of range
error which drove me crazy. After debugging, the following change made it work for me.
From predictions[0][i][0]
to predictions[0][0][i][0]
- adding a layer for it to find the sentiment solved it.
Here's how my get_sentiment()
now looks like:
def get_sentiment(document, network):
""" Create a dict of every entities with their associated sentiment """
print('Parsing Document...')
contexts = parse_doc(document)
print('Done!')
entities = init_dict(contexts)
sentences = [sentence.encode('utf-8') for _, sentence in contexts]
predictions = network.predict_sentences(sentences)
for i, c in enumerate(contexts):
key = c[0]
# =======
# Note: had to change from predictions[0][i][0] to predictions[0][0][i][0]
# to avoid IndexError: list index out of range
# =======
if entities[key] != None:
entities[key] += (predictions[0][0][i][0] - predictions[0][0][i][1])
entities[key] /= 2
else:
entities[key] = (predictions[0][0][i][0] - predictions[0][0][i][1])
for e in entities.keys():
print('Entity: %s -- sentiment: %s' % (e, entities[e]))
And this is what predictions
prints:
([array([[0.9401707 , 0.05982935],
[0.12330811, 0.8766919 ],
[0.05135493, 0.948645 ],
[0.04372008, 0.95627993],
[0.76382387, 0.23617609],
[0.07036395, 0.92963606],
[0.33202317, 0.66797686],
[0.43130934, 0.56869066],
[0.5940844 , 0.40591562],
[0.02576003, 0.97424 ],
[0.79468673, 0.20531328],
[0.6695906 , 0.33040938],
[0.8919715 , 0.10802842],
[0.21454915, 0.7854509 ],
[0.9765816 , 0.02341843],
[0.85761005, 0.14238995],
[0.97303224, 0.02696779],
[0.7652719 , 0.23472811],
[0.09908623, 0.9009138 ],
[0.71454215, 0.2854578 ],
[0.18902989, 0.81097007],
[0.74710584, 0.2528942 ],
[0.88460344, 0.11539648],
[0.878756 , 0.12124399],
[0.53546566, 0.46453434],
[0.9586443 , 0.04135578],
[0.7557437 , 0.24425633],
[0.92181313, 0.07818691],
[0.96484137, 0.03515861],
[0.5171254 , 0.4828745 ],
[0.5204062 , 0.47959384],
[0.8298136 , 0.17018642],
[0.7914332 , 0.20856674],
[0.6923996 , 0.30760032],
[0.04343254, 0.95656747],
[0.02746156, 0.9725385 ],
[0.85882574, 0.14117423],
[0.92434436, 0.07565565],
[0.14022514, 0.8597748 ],
[0.03307388, 0.9669261 ],
[0.9617045 , 0.03829547],
[0.7466851 , 0.25331494],
[0.70446235, 0.29553762],
[0.92154443, 0.07845555],
[0.8455439 , 0.15445615],
[0.09782176, 0.90217817],
[0.84305984, 0.15694015],
[0.9081956 , 0.09180436],
[0.4904544 , 0.5095456 ],
[0.7562372 , 0.24376276],
[0.74507135, 0.25492868],
[0.32424155, 0.6757585 ],
[0.71232206, 0.28767797],
[0.9237479 , 0.07625207],
[0.70901144, 0.2909886 ],
[0.39673564, 0.60326433],
[0.8038566 , 0.19614334],
[0.05801026, 0.9419897 ],
[0.26367864, 0.73632133],
[0.36928973, 0.63071024],
[0.9059966 , 0.09400345],
[0.88516587, 0.11483412],
[0.05036072, 0.94963926],
[0.85224956, 0.14775042]], dtype=float32)], [['0,Bitcoin is looking very promising .', 'pos', 0.9401707], ['0,John is not going too well', 'neg', 0.8766919], ['0,Litecoin is crashing', 'neg', 0.948645], ['0,Adam is Sad .', 'neg', 0.95627993]])
My results:
text = "Bitcoin is looking very promising. John is not going too well, and Litecoin is crashing. Adam is Sad."
Entity: Bitcoin -- sentiment: 0.88034135
Entity: Litecoin -- sentiment: -0.89729005
Entity: Adam -- sentiment: -0.91255987
Entity: John -- sentiment: -0.75338376
Is this correct, or did I do something wrong?
Thanks!
Hey Diego! Yeah, the output looks weird this has to do with the predict_sentences
method in CharLSTM. Basically, the network concatenate a bunch of sentences together to have a batch size equal to your BATCH_SIZE
then it outputs the sentiment to all these sentences.. I was too lazy to fix this haha..
Also, here's my java version:
$ java -version
openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)