hivecmd getresult changes delimiters for results >= 20Mb
dfrankow opened this issue · 9 comments
For smaller results, the column delimiter is \t.
For larger results, the column delimiter is \001, likely due to direct download from S3 without post-processing.
This is an awkward result to consume. I looked into patching get_results in commands.py, but _download_to_local is complicated (multiple files, possibly a directory, ..).
You can specify a delim
while calling get_results
and we will replace \001
with that delimiter.
Great workaround, thanks. However,
- the switching behavior is unexpected.
- when I use "set hive.cli.print.header=true;" the first line of output is tab-delimited, the rest \001-delimited.
Also, how does one specify this delim on the qds.py command line? There are many layers.
These don't work:
python qds.py hivecmd getresult 16010614 --delim="\t"
python qds.py hivecmd getresult --delim="\t" 16010614
python qds.py hivecmd --delim="\t" getresult 16010614
Looks like specifying the custom delimiter is not support from the command line. But we default to replacing \001
with \t
when getresult
is called from the command line: https://github.com/qubole/qds-sdk-py/blob/v1.9.0/bin/qds.py#L109
So, you can just call
python qds.py hivecmd getresult 16010614
and you would get tab separated columns.
About your previous comment,
- I agree the switching is unexpected. But you shouldn't notice it if you are using the command line because as I said above we automatically replace
\001
with\t
. And if you call the python methods directly, you can avoid it by specifying the customdelim
as\t
. - This is a known issue. I have rekindled the discussion to give an option to users to have the header
\001
delimited.
Thanks for your response.
But .. I'm using the command line, and it comes out with \001-s (after the header).
I can't see how to tell what version of qds.py I have, but it was installed quite recently, the past week or so.
Surprising, I just tried it and it seems to give tab separated results. I have sent you a Slack chatroom invitation, so we can discuss.
You can find out the qds-sdk version by doing pip list
.
When I try it from the github repo, it works (\t-s). So, this must be a pip install issue. Thanks.
I had an ancient version of qds-sdk (1.0.3b0), reason unknown. Once I installed 1.9.0, this works.
For my memory, I did:
$ sudo /usr/local/bin/python -m pip install --upgrade qds-sdk