taraslayshchuk/es2csv

Encoding issue while writing into csv

abidulrmdn opened this issue ยท 8 comments

Calling a field which is


"name": {
                        "type": "keyword"
                     },


Command that i ran:
es2csv -i index -D type -f name --verify-certs -u https://userwithurl -q '*' -o database.csv

And the error that was showing was :

Traceback (most recent call last):
File "/home/ubuntu/anaconda3/bin/es2csv", line 11, in
load_entry_point('es2csv==5.2.1', 'console_scripts', 'es2csv')()
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/es2csv.py", line 284, in main
es.write_to_csv()
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/es2csv.py", line 237, in write_to_csv
line_dict_utf8 = {k: v.encode('utf8') if isinstance(v, unicode) else v for k, v in line_as_dict.items()}
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/es2csv.py", line 237, in
line_dict_utf8 = {k: v.encode('utf8') if isinstance(v, unicode) else v for k, v in line_as_dict.items()}
NameError: name 'unicode' is not defined

python3.5 is not supported at the moment. Please use python 2.7.

Why not just make it str instead.
@taraslayshchuk

@taraslayshchuk I'm using es2csv with Python 2.7 yet I'm getting this unicode-related error:

Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/es2csv", line 11, in <module>
    sys.exit(main())
  File "/home/ubuntu/.local/lib/python2.7/site-packages/es2csv.py", line 284, in main
    es.write_to_csv()
  File "/home/ubuntu/.local/lib/python2.7/site-packages/es2csv.py", line 221, in write_to_csv
    csv_writer.writeheader()
  File "/usr/lib/python2.7/csv.py", line 141, in writeheader
    self.writerow(header)
  File "/usr/lib/python2.7/csv.py", line 152, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 8: ordinal not in range(128)

So I guess using 2.x is not a solution for the problem. I'm not sure what document is problematic, but \xe5 is aฬŠ.

I solved it by using unidecodecsv instead of csv:

diff --git a/es2csv.py b/es2csv.py
index b948843..509e5ea 100755
--- a/es2csv.py
+++ b/es2csv.py
@@ -16,7 +16,7 @@ import sys
 import time
 import argparse
 import json
-import csv
+import unicodecsv as csv
 import elasticsearch
 import progressbar
 from functools import wraps

@abdrmdn Answer for you question is in @katafrakt comment. More details, that's why we should use such methods.

Hello @katafrakt you have definitely another error. Looks like you have unicode naming in document, to be precise it is in key name and that is not expected by this tool.

Since this is already resolved, I'll close the issue.

Fixed in 5.5.2.