implydata/plyql

Columns order is wrong on the resultset

ynaguib opened this issue · 7 comments

Hi,
When using distinct, the columns order is wrong:

MySQL [(none)]> select week_date, sum(c1) as rch, count(distinct uuid) as rc from tab where country='CA' group by week_date;
+------------+-------------------+---------+
| week_date  | rc                | rch     |
+------------+-------------------+---------+
| 2016-05-16 | 586419.3622051235 | 1540934 |
| 2016-05-23 |  803478.913143905 | 2174018 |
| 2016-05-30 | 613292.7918142952 | 2048509 |
+------------+-------------------+---------+

MySQL [(none)]> select week_date, count(distinct uuid) as rc, sum(c1) as rch
from tab where country='CA' group by week_date;
+------------+-------------------+---------+
| week_date  | rc                | rch     |
+------------+-------------------+---------+
| 2016-05-16 | 586419.3622051235 | 1540934 |
| 2016-05-23 |  803478.913143905 | 2174018 |
| 2016-05-30 | 613292.7918142952 | 2048509 |
+------------+-------------------+---------+

Haven't tested retrieving the data by column using jdbc, but my guess is that would work.
It's just confusing when running on mysql to see the data in the columns in the wrong order as selected.

You are right, it should maintain order. Internally Plywood does not care about order, it should care more.

Will fix this.

Mind you I got the order of (pure) SELECT to work correctly this is a problem of a SELECT ... GROUP BY query.

I'm planing to use it with the --experimental-mysql-gateway flag. Even experimental, it open many doors for us. As long as we use the columns with their name, it's fine. Gets tricky when getting columns by position.

Thanks btw, great feature.

👍 Please file more issues if they come up. Will get the order fixed soon!

I have worked around this problem by using a nested select. To get your desired order do:

select week_date, rch, rc from(
select week_date, sum(c1) as rch, count(distinct uuid) as rc from tab where country='CA' group by week_date);

I had the same issue when using the time_bucket() function - its result is always the first column in the result independently of where it appears in the select statement.

I have the same problem when using plyql
./bin/plyql -h localhost -q 'select isNew, sum(delta) as d, sum(added) as add from wikiticker group by isNew;'

isNew add d
false 3386898 3108610
true 8387367 8387367

This should be fixed in latest plyql and the one released in Imply 2.0.0