400 Client Error while running CoreNLPParser

Question

400 Client Error while running CoreNLPParser

ehsong opened this issue a year ago · 2 comments

I am applying the parser to a pandas dataframe using Python 3 with about 20K rows with the function parser = CoreNLPParser('http://localhost:9000') and it works fine for about 2000 rows but after I get the 400 Client Error.

 raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://localhost:9000/?properties=%7B%22outputFormat%22%3A+%22json%22%2C+%22annotators%22%3A+%22tokenize%2Cssplit%22%7D

Is this happening because there are two many rows to go through? Should I try to cut up the dataframe? This also happened once only after 12 rows, and I added in the time.sleep() in the function to slow the process, but happening again...

My script looks like this:

def stanford_segmenter(text):
    if len(text) > 0:
        result = list(parser.tokenize(text))
    else:
        result = None
    time.sleep(0.5)
    return(result)

if __name__ == '__main__':

    # from pycorenlp import StanfordCoreNLP
    parser = CoreNLPParser('http://localhost:9000')

    df = pd.read_csv('my_data.csv')
    tqdm.pandas(desc="my bar!")

    df['tokenized_text'] = df['full_text'].progress_apply(stanford_segmenter)
    df.to_csv('final.csv')

Answer 1 · 2023-07-17T19:31:14.000Z

You could run the server in a separate process or run it with the python interface with be_quiet=False to see if it says anything. Maybe there's a particular input it fails on.

…

On Mon, Jul 17, 2023, 11:53 AM Esther E. Song ***@***.***> wrote: I am applying the parser to a pandas dataframe with about 20K rows with the function parser = CoreNLPParser('http://localhost:9000') and it works fine for about 2000 rows but after I get the 400 Client Error. raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://localhost:9000/?properties=%7B%22outputFormat%22%3A+%22json%22%2C+%22annotators%22%3A+%22tokenize%2Cssplit%22%7D Is this happening because there are two many rows to go through? Should I try to cut up the dataframe? My script looks like this: def stanford_segmenter(text): if len(text) > 0: result = list(parser.tokenize(text)) else: result = None time.sleep(0.5) return(result) if __name__ == '__main__': # from pycorenlp import StanfordCoreNLP parser = CoreNLPParser('http://localhost:9000') df = pd.read_csv('my_data.csv') tqdm.pandas(desc="my bar!") df['tokenized_text'] = df['full_text'].progress_apply(stanford_segmenter) df.to_csv('final.csv') — Reply to this email directly, view it on GitHub <#1376>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWLALY7YW2MVCADV6L3XQWC2XANCNFSM6AAAAAA2NLA3HA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 2 · 2023-07-18T09:32:09.000Z

@AngledLuffa Thank you for the comment. I found that the problem was an exceptionally long text in the data set - once it was removed I didn't get the error any more.