Error with annotate_ws.py
Opened this issue · 7 comments
Hi!
I was using annotate_ws.py to annotate custom questions. I ran annotate_ws.py on google cloud platform. However, I got this error:
python3 annotate_ws.py --split past,present annotating /home/Enzo/sqlova-shallow-layer/past.jsonl loading tables 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2716/2716 [00:00<00:00, 17256.43it/s] loading examples 0%| | 0/1690 [00:00<?, ?it/s]Starting server with command: java -Xmx5G -cp /home/Enzo/sqlova-shallow-layer/stanford-corenlp-4.0.0/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 60000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-6f9bf1976d784f04.props -preload tokenize,ssplit,pos,lemma,ner,depparse 0%| | 0/1690 [00:40<?, ?it/s] Traceback (most recent call last): File "annotate_ws.py", line 190, in <module> a = annotate_example_ws(d, tables[d['table_id']]) File "annotate_ws.py", line 107, in annotate_example_ws _nlu_ann = annotate(example['question']) File "annotate_ws.py", line 24, in annotate for s in client.annotate(sentence): TypeError: 'Document' object is not iterable
Could you tell me why this happened? Thank you in advance!
I am facing same issue. Did you get any solution to this problem?
@Daljeetka Not yet...
When running annotate_wa.py, i got an error: ModuleNotFoundError: No module named 'stanza.nlp'. But i has installed stanza. Which package else should I install?
i know. Change line 8 to from stanza.server import CoreNLPClient
. Now i am facing the same issue TypeError: 'Document' object is not iterable
too..
Try this:
import stanza
nlp = stanza.Pipeline('en')
def annotate(sentence, lower=True, nlp=nlp):
"""
Input: Question
Output: Tokenized input question
{
'gloss': original question,
'words': list of tokens,
'after': " " for tokens through last 2; last 2 tokens = ""
}
"""
doc = nlp(sentence)
words, gloss, after = [], [], []
for sentence in doc.sentences:
for token in sentence.tokens:
word, originalText = token.text, token.text
after_ = " "
words.append(word)
gloss.append(originalText)
after.append(after_)
after[-2:] = ["", ""]
if lower:
words = [w.lower() for w in words]
return {
'gloss': gloss,
'words': words,
'after': after,
}
In case of latest stanza I have to make these changes to work (check lines with ###), started the coreNLP server outside (check this stanfordnlp/stanza#245 (comment))
#!/usr/bin/env python3
from argparse import ArgumentDefaultsHelpFormatter, ArgumentParser
from asyncio import start_server
import os
import records
import ujson as json
from stanza.server.client import CoreNLPClient ###
from tqdm import tqdm
import copy
from lib.common import count_lines, detokenize
from lib.query import Query
import stanza.server as corenlp ###
client = None
if client is None:
client = CoreNLPClient(annotators='tokenize,ssplit,pos,lemma,ner,depparse',
start_server=corenlp.StartServer.DONT_START) ###
words, gloss, after = [], [], []
objs = client.annotate(sentence) ###
for s in objs.sentence: ###
for t in s.token: ###
words.append(t.word)
gloss.append(t.originalText)
after.append(t.after)
Yes, the code by @dsivakumar seems to be correct. The return value of client.annotate(sentence)
is not an actual Document
object, no matter what the error message says. It's something called a Protobuf, as explained (sort of) here. These objects' fields are named in the singular (sentence
, token
) even though they refer to iterables of multiple sentences and tokens.