phovea/phovea_server

Error when loading UTF-8 encoded CSV files

Closed this issue · 1 comments

  • Release number or git hash: cb6cba0
  • Environment (local or deployed): local

Steps to reproduce

  1. Create an UTF-8 encoded CSV file in the data directory with the following content:
countrycode	Turmion Kätilöt	Аквариум	Böhse Onkelz	Гражданская Оборона	Ляпис Трубецкой	嵐	Czesław Śpiewa	소녀시대
US	0	0	0	0	0	0	0	0	0
  1. Add this to the index.json:
[
  {
    "name": "ut8-encoded-header",
    "separator": "\t",
    "value": {
      "range": [
        0,
        1
      ],
      "type": "real"
    },
    "rowtype": "Country",
    "coltype": "Artist",
    "path": "ut8-encoded-header.csv",
    "type": "matrix",
    "size": [
      1,
      9
    ]
  }
]

Observed behavior

  • Server errors (relevant messages and tracebacks)
api_1  | 12:46:49 ERROR phovea_server.dataset_api: ValueError: () at {'HTTP_COOKIE': 'remember_token=jovial_bell|1c2c0dcd80d28c892b0d6419b394ee2dcdc26b241ba240d6ac621a012341f5c349401eefca309a1d47664ecdeb07458a06d508e45b1a1158a154ecaf46495960; session=.eJwdjkGLwjAQhf_KMmcPGs2l4C1SIswUpVoyF3FJbToxHixL3Yj_fcPyDu_y-N73hsvt2U8Bqtv1PvULuIweqjd8fUMFTnYvEqtZ4toJbrhFjeowY8dCyq2oOwqq04zKZsxBXDoHbr00Hb5QbG5qu8HaqZJMQiMbEm6HFZv9WAgBxSlSFNj46DKn0gnzsKY8qLLTmO1M5bsxXjCdftFQJLPTlKNu6mNkg0tUrjjZLXwW8DP1z39_uPo0PuDzB8BNSQI.DRLE2g.MwU7rRRdSH9XvcmS6JOFlkgtgdc; _xsrf=2|41039547|a39f1c5a05f30cbd77b3511fff3a96db|1513347785; username-localhost-8888="2|1:0|10:1513347846|23:username-localhost-8888|44:MTU0ODNmYTdhZWUzNDAyNzgyNmFiNmJlZmI3NDE0Yjg=|c6d845830272ebf1c088771b46aed06f36165cd67c62185b4d4e3943c439f343"', 'SERVER_SOFTWARE': 'gevent/1.1 Python/2.7', 'SCRIPT_NAME': '/api/dataset', 'REQUEST_METHOD': 'GET', 'PATH_INFO': '/matrix/tacoServerLastFm2005-01/cols', 'SERVER_PROTOCOL': 'HTTP/1.1', 'QUERY_STRING': '', 'REMOTE_ADDR': '172.21.0.1', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.24 Safari/537.36', 'HTTP_CONNECTION': 'close', 'SERVER_NAME': '7c81fbba4c7e', 'REMOTE_PORT': '46462', 'werkzeug.proxy_fix.orig_wsgi_url_scheme': 'http', 'HTTP_IF_NONE_MATCH': '"a9958eb42a92067593d98e4e8c95a25c04ea24d9"', 'wsgi.url_scheme': 'http', 'SERVER_PORT': '80', 'werkzeug.proxy_fix.orig_http_host': 'localhost:8080', 'werkzeug.request': <Request 'http://localhost:8080/api/dataset/matrix/tacoServerLastFm2005-01/cols' [GET]>, 'wsgi.input': <gevent.pywsgi.Input object at 0x7f6cc0513188>, 'werkzeug.proxy_fix.orig_remote_addr': '172.21.0.1', 'HTTP_HOST': 'localhost:8080', 'wsgi.multithread': False, 'HTTP_UPGRADE_INSECURE_REQUESTS': '1', 'HTTP_CACHE_CONTROL': 'max-age=0', 'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'wsgi.version': (1, 0), 'GATEWAY_INTERFACE': 'CGI/1.1', 'wsgi.run_once': False, 'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7f6cc6e1b1e0>, 'wsgi.multiprocess': False, 'HTTP_ACCEPT_LANGUAGE': 'de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate, br'}
api_1  | 12:46:49 ERROR phovea_server.dataset_api: 'ascii' codec can't decode byte 0xc3 in position 9: ordinal not in range(128)

The browser shows me an 500 - ValueError.

Expected behavior

The CSV loader should handle UTF-8 encoded files without problems.

Done with PR #61.