stanfordnlp/CoreNLP

400 Client Error while running CoreNLPParser

ehsong opened this issue · 2 comments

ehsong commented

I am applying the parser to a pandas dataframe using Python 3 with about 20K rows with the function parser = CoreNLPParser('http://localhost:9000') and it works fine for about 2000 rows but after I get the 400 Client Error.

 raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://localhost:9000/?properties=%7B%22outputFormat%22%3A+%22json%22%2C+%22annotators%22%3A+%22tokenize%2Cssplit%22%7D

Is this happening because there are two many rows to go through? Should I try to cut up the dataframe? This also happened once only after 12 rows, and I added in the time.sleep() in the function to slow the process, but happening again...

My script looks like this:

def stanford_segmenter(text):
    if len(text) > 0:
        result = list(parser.tokenize(text))
    else:
        result = None
    time.sleep(0.5)
    return(result)

if __name__ == '__main__':

    # from pycorenlp import StanfordCoreNLP
    parser = CoreNLPParser('http://localhost:9000')

    df = pd.read_csv('my_data.csv')
    tqdm.pandas(desc="my bar!")

    df['tokenized_text'] = df['full_text'].progress_apply(stanford_segmenter)
    df.to_csv('final.csv')
ehsong commented

@AngledLuffa Thank you for the comment. I found that the problem was an exceptionally long text in the data set - once it was removed I didn't get the error any more.