gsi-upm/senpy

'ascii' codec can't encode characters

ddwell opened this issue · 3 comments

I have accidentally saw an exception.

Looks like something does not like "…" symbol.

http://senpy.cluster.gsi.dit.upm.es/api/?algo=vaderSentiment&i=%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6&language=es

{
  "@context": "http://senpy.cluster.gsi.dit.upm.es/api/contexts/Error.jsonld", 
  "errors": "", 
  "message": "'ascii' codec can't encode characters in position 0-6: ordinal not in range(128)", 
  "params": {}, 
  "status": 500
}

Here's a traceback (using the head version of senpy - commit cc29874 "Merge branch '17-...' into 0.8.x")

Getting request:
<Request 'http://localhost:5000/api/?algo=hashTagClassification&i=…………………&estimator=LinearSVC' [GET]>
{'algo': 'hashTagClassification',
 'algorithm': 'hashTagClassification',
 'conversion': 'full',
 'estimator': 'LinearSVC',
 'expanded-jsonld': 0,
 'i': '…………………',
 'informat': 'text',
 'input': '…………………',
 'intype': 'direct',
 'outformat': 'json-ld',
 'plugin_type': 'analysisPlugin',
 'prefix': '',
 'urischeme': 'RFC5147String'}
'LinearSVC'
Traceback (most recent call last):
  File "/home/ianwoo/.local/lib/python3.5/site-packages/gevent/pywsgi.py", line 824, in start_response
    value if not PY3 else value.encode("latin-1")))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 7-13: ordinal not in range(256)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ianwoo/.local/lib/python3.5/site-packages/gevent/pywsgi.py", line 936, in handle_one_response
    self.run_application()
  File "/home/ianwoo/.local/lib/python3.5/site-packages/gevent/pywsgi.py", line 909, in run_application
    self.result = self.application(self.environ, self.start_response)
  File "/home/ianwoo/anaconda3/lib/python3.5/site-packages/flask/app.py", line 1994, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/ianwoo/anaconda3/lib/python3.5/site-packages/flask/app.py", line 1986, in wsgi_app
    return response(environ, start_response)
  File "/home/ianwoo/anaconda3/lib/python3.5/site-packages/werkzeug/wrappers.py", line 1277, in __call__
    start_response(status, headers)
  File "/home/ianwoo/.local/lib/python3.5/site-packages/gevent/pywsgi.py", line 827, in start_response
    raise UnicodeError("Non-latin1 header", repr(header), repr(value))
UnicodeError: ('Non-latin1 header', "'X-ORIGINAL-PARAMS'", '"{\'i\': \'…………………\', \'estimator\': \'LinearSVC\', \'algo\': \'hashTagClassification\'}"')
Thu Mar 30 09:59:49 2017 {'REMOTE_ADDR': '127.0.0.1', 'REMOTE_PORT': '50746', 'HTTP_HOST': 'localhost:5000', (hidden keys: 23)} failed with UnicodeError

127.0.0.1 - - [2017-03-30 09:59:49] "GET /api/?algo=hashTagClassification&i=%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6&estimator=LinearSVC HTTP/1.1" 500 161 0.215803

With debug turned on (and the 'disable pip workaround'), I also get the following just before the first Traceback.

Seems the problem is occurring during construction of the response (after analysis).

Fyi: the http response is "ERROR 500: Internal Server Error" (as reported by wget).

DEBUG:hashTagClassification:Hashtag SVM Analysing with params {'i': '…………………', 'algo': 'hashTagClassification', 'expanded-jsonld': 0, 'informat': 'text', 'prefix': '', 'input': '…………………', 'algorithm': 'hashTagClassification', 'plugin_type': 'analysisPlugin', 'estimator'
: 'LinearSVC', 'urischeme': 'RFC5147String', 'intype': 'direct', 'conversion': 'full', 'outformat': 'json-ld'}
{'algo': 'hashTagClassification',
 'algorithm': 'hashTagClassification',
 'conversion': 'full',
 'estimator': 'LinearSVC',
 'expanded-jsonld': 0,
 'i': '…………………',
 'informat': 'text',
 'input': '…………………',
 'intype': 'direct',
 'outformat': 'json-ld',
 'plugin_type': 'analysisPlugin',
 'prefix': '',
 'urischeme': 'RFC5147String'}
'LinearSVC'
DEBUG:senpy.extensions:Returning analysis result: {
    "@context": {
        "@vocab": "http://www.gsi.dit.upm.es/ontologies/senpy#",

...


    },
    "@id": "_:Results_1490864943.9764054",
    "@type": "results",
    "analysis": [
        "hashTagClassification"
    ],
    "entries": [
        {
            "@id": "_:Entry_1490864944.099767",
            "@type": "entry",
            "emotions": [
                {
                    "@id": "Emotions",
                    "@type": "emotionSet",
                    "onyx:hasEmotion": {
                        "@id": "Emotion",
                        "@type": "emotion",
                        "http://www.gsi.dit.upm.es/ontologies/onyx/vocabularies/anew/ns#arousal": 6.859999999999999,
                        "http://www.gsi.dit.upm.es/ontologies/onyx/vocabularies/anew/ns#dominance": 4.94,
                        "http://www.gsi.dit.upm.es/ontologies/onyx/vocabularies/anew/ns#valence": 5.9,
                        "onyx:hasEmotionCategory": [
                            "http://gsi.dit.upm.es/ontologies/wnaffect/ns#negative-fear",
                            "http://gsi.dit.upm.es/ontologies/wnaffect/ns#joy"
                        ]
                    }
                }
            ],
            "entities": [],
            "nif:isString": "\u2026\u2026\u2026\u2026\u2026\u2026\u2026",
            "sentiments": [],
            "suggestions": [],
            "topics": []
        }
    ]
}
Traceback (most recent call last):
  File "/home/ianwoo/.local/lib/python3.5/site-packages/gevent/pywsgi.py", line 824, in start_response
    value if not PY3 else value.encode("latin-1")))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 7-13: ordinal not in range(256)

During handling of the above exception, another exception occurred:

...

Thanks for reporting!

Fixed in the up-common 0.8.6: https://lab.cluster.gsi.dit.upm.es/senpy/senpy/issues/21