Support Query Chaining

Question

Support Query Chaining

Opened this issue 2 years ago · 1 comments

Using RBQL, I can get a count of distinct values in a column with either select DISTINCT COUNT a6 or SELECT COUNT(a6), a6 GROUP BY a6, but in both cases I cannot sort the result by the count column.

The workaround is to use "consecutive queries" - open a new RBQL on the result of the first query, and run select * order by parseInt(a1).

It would be nice to be able to write multiple RBQL statements separated by a pipe symbol, so the results of the first RBQL generate an in-memory table which is then queried with the second RBQL before being output, e.g. SELECT COUNT(a6), a6 GROUP BY a6 | select * order by parseInt(a1)

Answer 1 · 2022-04-30T01:52:37.000Z

Something like this is possible with command line RBQL:

rbql --input countries.csv --delim , --with-headers --query 'select a2, count(a2) group by a2' | rbql --delim , --with-headers --query 'select * order by int(a2)'

To support this on the query level I think it is better to use SQL nested query syntax e.g.
SELECT * FROM (SELECT COUNT(a6), a6 GROUP BY a6) ORDER BY a1
The main problem is that this additionally complicates the parsing algorithm. I will try to experiment with this when I have some free time.