qubole/qds-sdk-py

hivecmd getresult changes delimiters for results >= 20Mb

dfrankow opened this issue · 9 comments

For smaller results, the column delimiter is \t.
For larger results, the column delimiter is \001, likely due to direct download from S3 without post-processing.

This is an awkward result to consume. I looked into patching get_results in commands.py, but _download_to_local is complicated (multiple files, possibly a directory, ..).

You can specify a delim while calling get_results and we will replace \001 with that delimiter.

Great workaround, thanks. However,

  1. the switching behavior is unexpected.
  2. when I use "set hive.cli.print.header=true;" the first line of output is tab-delimited, the rest \001-delimited.

Also, how does one specify this delim on the qds.py command line? There are many layers.

These don't work:

python qds.py hivecmd getresult 16010614 --delim="\t"
python qds.py hivecmd getresult --delim="\t" 16010614
python qds.py hivecmd --delim="\t" getresult 16010614

Looks like specifying the custom delimiter is not support from the command line. But we default to replacing \001 with \t when getresult is called from the command line: https://github.com/qubole/qds-sdk-py/blob/v1.9.0/bin/qds.py#L109

So, you can just call

python qds.py hivecmd getresult 16010614

and you would get tab separated columns.


About your previous comment,

  1. I agree the switching is unexpected. But you shouldn't notice it if you are using the command line because as I said above we automatically replace \001 with \t. And if you call the python methods directly, you can avoid it by specifying the custom delim as \t.
  2. This is a known issue. I have rekindled the discussion to give an option to users to have the header \001 delimited.

Thanks for your response.

But .. I'm using the command line, and it comes out with \001-s (after the header).

I can't see how to tell what version of qds.py I have, but it was installed quite recently, the past week or so.

Surprising, I just tried it and it seems to give tab separated results. I have sent you a Slack chatroom invitation, so we can discuss.

You can find out the qds-sdk version by doing pip list.

When I try it from the github repo, it works (\t-s). So, this must be a pip install issue. Thanks.

I had an ancient version of qds-sdk (1.0.3b0), reason unknown. Once I installed 1.9.0, this works.

For my memory, I did:

$ sudo /usr/local/bin/python -m pip install --upgrade qds-sdk