Program error
YaoXinZhi opened this issue · 5 comments
Hello, I encountered the following problem while running sudo python3 client.py INPUT.TXT OUTPUT.CONLL --name gene_all --assume_sentence_splitted
. My INPUT.TXT has been split into one sentence per line Hope you can help me, thank you.
`(base) xinzhi@xinzhi-QTK5:~/Desktop/PTO/HUNER/huner-master$ sudo python3 client.py INPUT.TXT OUTPUT.CONLL --name gene_all --assume_sentence_splitted
Traceback (most recent call last):
File "client.py", line 130, in
tagged_line = tagger.tag(buff, split_sentences=split_sentences, tokenize=not args.assume_tokenized)[0]
File "client.py", line 96, in tag
results.append(response.json())
File "/usr/lib/python3/dist-packages/requests/models.py", line 892, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3/dist-packages/simplejson/init.py", line 518, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
`
I noticed because of this line client 100 lines
response = requests.post (f "http: // localhost: {port} / tag", json = data)
After successfully processing two batches, it returns <Response [500]>
As a result, subsequent response.json ()
cannot be obtained
Thereby reporting an error, is there a way to solve this problem.
look forward to your reply, thank you.
I solved this problem by reducing the batch_size, maybe it is the problem of my network link, but I noticed that when I deal with the number of sentences more than 500, I will definitely report an error, so I changed the input file to a 500-line one , But this is very troublesome, maybe you have an easier way, looking forward to your help.
Hey @YaoXinZhi ,
thanks for reporting this issue. Great news you found a mitigation! The trace given in the original post indicates that the server returned an empty response. In case of a failure, the server outputs a stack trace as well. Could you post this here, to give further insides into the problem.
In addition, we observed similar issues for files with encoding problems. I just pushed a new HUNER version 3834856. You can also try updating to this version and start the client with the --fix_encoding option. Would be great to know if this fixes your problem.
Hey @yetinam ,
Oh, thank you for your quick reply and updated version. This is undoubtedly exciting news. As you said, the server also reported a problem, which happened to be a coding problem. I posted it below.
172.17.0.1--[20 / May / 2020 15:49:15] "POST / tag HTTP / 1.1" 500-
INFO: werkzeug: 172.17.0.1--[20 / May / 2020 15:49:15] "POST / tag HTTP / 1.1" 500-
ERROR: tagger: Exception on / tag [POST]
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request ()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception (e)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1820, in handle_user_exception
reraise (exc_type, exc_value, tb)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request ()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions [rule.endpoint] (** req.view_args)
File "tagger.py", line 65, in tag
tokenized_sentences = [tokenize (s) for s in sentences]
File "tagger.py", line 41, in tokenize
return tokenizer.parse (sentence) .decode (). split ()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 96: ordinal not in range (128)
In addition, I also noticed the coding problem and was distressed, so thank you again for updating the version so quickly and solving this problem. happy 5.20, this is a romantic day.
Hey @yetinam ,
When I was processing small files, I noticed that it was not an error after processing 500 sentences, but it was more like the coding of a specific sentence (punctuation) caused an error, but it just contained an error in the 500th sentence. So after I updated the code of fix_encoding, the problem was completely solved, and I can now handle the complete data with my confidant. Thanks again for your help, HUNER is a amazing job.
Best wish.