naver/sqlova

Error with annotate_ws.py

Opened this issue · 7 comments

bfinj commented

Hi!

I was using annotate_ws.py to annotate custom questions. I ran annotate_ws.py on google cloud platform. However, I got this error:
python3 annotate_ws.py --split past,present annotating /home/Enzo/sqlova-shallow-layer/past.jsonl loading tables 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2716/2716 [00:00<00:00, 17256.43it/s] loading examples 0%| | 0/1690 [00:00<?, ?it/s]Starting server with command: java -Xmx5G -cp /home/Enzo/sqlova-shallow-layer/stanford-corenlp-4.0.0/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 60000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-6f9bf1976d784f04.props -preload tokenize,ssplit,pos,lemma,ner,depparse 0%| | 0/1690 [00:40<?, ?it/s] Traceback (most recent call last): File "annotate_ws.py", line 190, in <module> a = annotate_example_ws(d, tables[d['table_id']]) File "annotate_ws.py", line 107, in annotate_example_ws _nlu_ann = annotate(example['question']) File "annotate_ws.py", line 24, in annotate for s in client.annotate(sentence): TypeError: 'Document' object is not iterable

Could you tell me why this happened? Thank you in advance!

I am facing same issue. Did you get any solution to this problem?

bfinj commented

@Daljeetka Not yet...

When running annotate_wa.py, i got an error: ModuleNotFoundError: No module named 'stanza.nlp'. But i has installed stanza. Which package else should I install?

i know. Change line 8 to from stanza.server import CoreNLPClient. Now i am facing the same issue TypeError: 'Document' object is not iterable too..

Try this:

import stanza
nlp = stanza.Pipeline('en')

def annotate(sentence, lower=True, nlp=nlp):
    """
    Input: Question
    Output: Tokenized input question
    {
        'gloss': original question,
        'words': list of tokens,
        'after': " " for tokens through last 2; last 2 tokens = ""
    }
    """
    doc = nlp(sentence)
    
    words, gloss, after = [], [], []
    for sentence in doc.sentences:
        for token in sentence.tokens:
            word, originalText = token.text, token.text
            after_ = " "

            words.append(word)
            gloss.append(originalText)
            after.append(after_)
        after[-2:] = ["", ""]
    if lower:
        words = [w.lower() for w in words]
    return {
        'gloss': gloss,
        'words': words,
        'after': after,
        }

In case of latest stanza I have to make these changes to work (check lines with ###), started the coreNLP server outside (check this stanfordnlp/stanza#245 (comment))


#!/usr/bin/env python3
from argparse import ArgumentDefaultsHelpFormatter, ArgumentParser
from asyncio import start_server
import os
import records
import ujson as json
from stanza.server.client import CoreNLPClient ###
from tqdm import tqdm
import copy
from lib.common import count_lines, detokenize
from lib.query import Query
import stanza.server as corenlp ###

client = None
    if client is None:
        client = CoreNLPClient(annotators='tokenize,ssplit,pos,lemma,ner,depparse',
            start_server=corenlp.StartServer.DONT_START) ###
    words, gloss, after = [], [], []
    objs = client.annotate(sentence) ###
    for s in objs.sentence: ###
        for t in s.token: ###
            words.append(t.word)
            gloss.append(t.originalText)
            after.append(t.after)

Yes, the code by @dsivakumar seems to be correct. The return value of client.annotate(sentence) is not an actual Document object, no matter what the error message says. It's something called a Protobuf, as explained (sort of) here. These objects' fields are named in the singular (sentence, token) even though they refer to iterables of multiple sentences and tokens.