400 Client Error while running CoreNLPParser
ehsong opened this issue · 2 comments
ehsong commented
I am applying the parser to a pandas dataframe using Python 3 with about 20K rows with the function parser = CoreNLPParser('http://localhost:9000')
and it works fine for about 2000 rows but after I get the 400 Client Error.
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://localhost:9000/?properties=%7B%22outputFormat%22%3A+%22json%22%2C+%22annotators%22%3A+%22tokenize%2Cssplit%22%7D
Is this happening because there are two many rows to go through? Should I try to cut up the dataframe? This also happened once only after 12 rows, and I added in the time.sleep() in the function to slow the process, but happening again...
My script looks like this:
def stanford_segmenter(text):
if len(text) > 0:
result = list(parser.tokenize(text))
else:
result = None
time.sleep(0.5)
return(result)
if __name__ == '__main__':
# from pycorenlp import StanfordCoreNLP
parser = CoreNLPParser('http://localhost:9000')
df = pd.read_csv('my_data.csv')
tqdm.pandas(desc="my bar!")
df['tokenized_text'] = df['full_text'].progress_apply(stanford_segmenter)
df.to_csv('final.csv')
AngledLuffa commented
You could run the server in a separate process or run it with the python
interface with be_quiet=False to see if it says anything. Maybe there's a
particular input it fails on.
…On Mon, Jul 17, 2023, 11:53 AM Esther E. Song ***@***.***> wrote:
I am applying the parser to a pandas dataframe with about 20K rows with
the function parser = CoreNLPParser('http://localhost:9000') and it works
fine for about 2000 rows but after I get the 400 Client Error.
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://localhost:9000/?properties=%7B%22outputFormat%22%3A+%22json%22%2C+%22annotators%22%3A+%22tokenize%2Cssplit%22%7D
Is this happening because there are two many rows to go through? Should I
try to cut up the dataframe?
My script looks like this:
def stanford_segmenter(text):
if len(text) > 0:
result = list(parser.tokenize(text))
else:
result = None
time.sleep(0.5)
return(result)
if __name__ == '__main__':
# from pycorenlp import StanfordCoreNLP
parser = CoreNLPParser('http://localhost:9000')
df = pd.read_csv('my_data.csv')
tqdm.pandas(desc="my bar!")
df['tokenized_text'] = df['full_text'].progress_apply(stanford_segmenter)
df.to_csv('final.csv')
—
Reply to this email directly, view it on GitHub
<#1376>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWLALY7YW2MVCADV6L3XQWC2XANCNFSM6AAAAAA2NLA3HA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
ehsong commented
@AngledLuffa Thank you for the comment. I found that the problem was an exceptionally long text in the data set - once it was removed I didn't get the error any more.