google/lovefield

groupBy in Joins

deepak342 opened this issue · 7 comments

groupBy in joins is failing when I include fields from both the tables.
Syntax error: (525) Invalid projection list or groupBy columns.

If I remove the fields from the second table, the error goes away and the query is executed properly.

I suspect that is your projection list that is causing the problem, meaning the columns you are passing in the select() call. The 525 error is thrown from https://github.com/google/lovefield/blob/master/lib/query/select_builder.js#L152, which indicates that for at least one of your projected columns the following is true

  1. It is not an aggregated column
  2. It does not appear in the groupBy() statement.

A short code snippet would be very helpful, to verify if that is the case, or if this needs further investigation.

This is the query I'm running. The projection list has the groupBy column and I have an aggregate column too.

"limit(5)
-project(COUNT(A.id),A.name,B.id,B.brand,B.sales, groupBy(A.name))
--order_by(A.name ASC, A.id ASC)
---aggregation(COUNT(A.id))
----groupBy(A.name)
-----join(type: inner, impl: index_nested_loop, join_pred(A.id eq B.brand))
------select(value_pred(A.name eq One))
-------table_access_by_row_id(Brand)
--------index_range_scan(Brand.pkBrand, [unbound, 6], natural)
------no_op_step(B)
"

Your query does not look right. Projection list for group by's can only be performed with columns in group by lists. Here's an article explaining common errors with group-bys

http://www.informit.com/articles/article.aspx?p=664143&seqNum=6

schemaBuilder.createTable('Brand')`
.addColumn('id', lf.Type.INTEGER)
.addColumn('name', lf.Type.STRING)
.addColumn('desc', lf.Type.STRING)
.addColumn('image', lf.Type.STRING)
.addNullable(['desc', 'image'])
.addPrimaryKey(['id'], true)
.persistentIndex(true);

schemaBuilder.createTable('BrandPopularity')
.addColumn('id', lf.Type.INTEGER)
.addColumn('branch', lf.Type.INTEGER)
.addColumn('brand', lf.Type.INTEGER)
.addColumn('year', lf.Type.INTEGER)
.addColumn('sales', lf.Type.INTEGER)
.addColumn('rank', lf.Type.INTEGER)
.addNullable(['branch', 'year'])
.addPrimaryKey(['id'], true)
.addForeignKey('FK_BrandPopularity_Branch_1', {local: 'branch', ref: 'Branch.id'})
.addForeignKey('FK_BrandPopularity_Brand_1', {local: 'brand', ref: 'Brand.id'})
.addIndex('FK_indexes', ['brand'])
.persistentIndex(true);

var a = db.getSchema().table('Brand');
var b = db.getSchema().table('BrandPopularity');`

THIS WORKS
_db.select(lf.fn.count(a.id), a.name, a.name).from(a, b).where(b.brand.eq(a.id)).groupBy(a.name).exec();

THIS ONE DOES NOT
_db.select(lf.fn.count(a.id), a.name, a.name, b.id, b.brand, b.sales).from(a, b).where(b.brand.eq(a.id)).groupBy(a.name).exec();

Any idea, what is wrong with my projection list? I have been trying to figure this out for a while but no luck.

The restriction on groupBy() columns and select() columns (referred as projection list below) combinations can be summarized as follows.

Columns in the projection list must be either aggregate columns (AVG, SUM etc), or they must appear in the groupBy.

In your 2nd example b.id,b.brand,b.sales are non-aggregate columns and they don't appear in the groupBy() either, which is why the query is rejected.

@freshp86 Thanks for the clarification.

Is there any workaround for this restriction?

Most of my use cases that use groupBy also use joins and I need them in my projection list.

The workaround is to put all non-aggregate columns that are in the projection list, ALSO in the groupBy list. The result of the query should be the same.