DISTINCT statement on multiple columns

Question

DISTINCT statement on multiple columns

Closed this issue 3 years ago · 2 comments

Hi,

Thanks for creating this project! I've been using it for a while and loved it.

However, I noticed a potential issue with SELECT DISTINCT statement with multiple columns:

For example:

In [2]: print(json.dumps(parse("SELECT DISTINCT Country, City FROM Customers;"), indent=4))
Out[2]:
{
    "select": [
        {
            "value": {
                "distinct": "Country"
            }
        },
        {
            "value": "City"
        }
    ],
    "from": "Customers"
}

The SQL means that we want to find all distinct country & city pairs, but from the parsed result it looks like distinct statement is only attached to Country, which IMO is kind of counterintuitive. Currently I have to look into the first value and see if it has distinct as a key.

What do we think about parsing the above query into the following format, so that all columns are under the distinct statement?

{
    "select": [
        {
            "value": {
                "distinct": [
                    {
                        "value": "Country"
                    },
                    {
                        "value": "City"
                    }
                ]
            }
        }
    ],
    "from": "Customers"
}

Thanks!

Answer 1 · 2020-02-14T13:39:45.000Z

Thank you for pointing this out. I will add tests for this.

It is likely that select distinct a, b will result in

{
    "select_distinct":[{"value":"a"}, {"value":"b"}]
}

...just another clause

Answer 2 · 2020-10-17T02:05:41.000Z

fixed here klahnakoski@762e109#diff-61d5a1a018e97cc478eba974e6abfff036bf58e449578b7b680fab7f2e63976dR1072