groupBy method should allow more control over method option
adambom opened this issue · 2 comments
I want to be able provide a function for each of the columns I specify. Let me provide an example.
The following code doesn't work because it just does sum on the year column.
ds.where(function (row) {
return row.year === 2010;
}).groupBy('race', ['population', 'year']).columns(['year', 'race', 'population']).toJSON();
I end up with a view that looks like this:
[ { _id: 39155,
race: 'White',
population: 77036790.6869098,
year: 643200 },
{ _id: 39157,
race: 'Black',
population: 11941573.990469573,
year: 643200 },
{ _id: 39159,
race: 'Asian',
population: 5038865.1221144805,
year: 643200 },
{ _id: 39161,
race: 'Hispanic',
population: 13029068.243787775,
year: 643200 },
{ _id: 39163,
race: 'Other',
population: 6686024.6175516695,
year: 643200 } ]
Notice how the value for year is not 2010, which is what I want.
I'd like to propose an update to the method option in groupBy that allows me to pass an object with key being the column I'm aggregating by, and value being a function that determines the aggregation method. If none is specified, you get sum. I could rewrite my example above like so:
ds.where(function (row) {
return row.year === 2010;
}).groupBy('race', ['population', 'year'], {
method: {
year: _.identity
population: _.sum // optional, but showing here to be explicit
}
})
.columns(['year', 'race', 'population']).toJSON();
This would return:
[ { _id: 39155,
race: 'White',
population: 77036790.6869098,
year: 2010 },
{ _id: 39157,
race: 'Black',
population: 11941573.990469573,
year: 2010 },
{ _id: 39159,
race: 'Asian',
population: 5038865.1221144805,
year: 2010 },
{ _id: 39161,
race: 'Hispanic',
population: 13029068.243787775,
year: 2010 },
{ _id: 39163,
race: 'Other',
population: 6686024.6175516695,
year: 2010 } ]
Whatcha think? I'm open to other approaches too. What I'm trying to emulate is this SQL
SELECT year, race, sum(population)
FROM mytable
WHERE year=2010
GROUP BY race;
Could you give me a sample of your data? I'm not really sure why the year
is getting converted to something strange. Is it being added up?
-- Irene
On Thu, Jun 13, 2013 at 4:57 PM, Adam Savitzky notifications@github.comwrote:
I want to be able provide a function for each of the columns I specify.
Let me provide an example.The following code doesn't work because it just does sum on the year
column.ds.where(function (row) {
return row.year === 2010;}).groupBy('race', ['population', 'year']).columns(['year', 'race', 'population']).toJSON();I end up with a view that looks like this:
[ { _id: 39155,
race: 'White',
population: 77036790.6869098,
year: 643200 },
{ _id: 39157,
race: 'Black',
population: 11941573.990469573,
year: 643200 },
{ _id: 39159,
race: 'Asian',
population: 5038865.1221144805,
year: 643200 },
{ _id: 39161,
race: 'Hispanic',
population: 13029068.243787775,
year: 643200 },
{ _id: 39163,
race: 'Other',
population: 6686024.6175516695,
year: 643200 } ]Notice how the value for year is not 2010, which is what I want.
I'd like to propose an update to the method option in groupBy that allows
me to pass an object with key being the column I'm aggregating by, and
value being a function that determines the aggregation method. If none is
specified, you get sum. I could rewrite my example above like so:ds.where(function (row) {
return row.year === 2010;}).groupBy('race', ['population', 'year'], {
method: {
year: _.identity
population: _.sum // optional, but showing here to be explicit
}}).columns(['year', 'race', 'population']).toJSON();This would return:
[ { _id: 39155,
race: 'White',
population: 77036790.6869098,
year: 2010 },
{ _id: 39157,
race: 'Black',
population: 11941573.990469573,
year: 2010 },
{ _id: 39159,
race: 'Asian',
population: 5038865.1221144805,
year: 2010 },
{ _id: 39161,
race: 'Hispanic',
population: 13029068.243787775,
year: 2010 },
{ _id: 39163,
race: 'Other',
population: 6686024.6175516695,
year: 2010 } ]Whatcha think? I'm open to other approaches too. What I'm trying to
emulate is this SQLSELECT year, race, sum(population) FROM mytable WHERE year=2010 GROUP BY race;
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/202
.
Yes, the year is being added up. It would be nice if there was a way to specify aggregators for all the columns, so this didin't happen.
The data looks like this:
year | race | region | population
-------------------------------------
2010 | hispanic | south | 100
2010 | hispanic | east | 100
2010 | hispanic | west | 100
2011 | hispanic | east | 200
2011 | hispanic | south | 200
2011 | hispanic | west | 200
2012 | hispanic | west | 300
2012 | hispanic | east | 300
2012 | hispanic | south | 300
2010 | white | south | 200
2010 | white | east | 200
2010 | white | west | 200
2011 | white | east | 300
2011 | white | south | 300
2011 | white | west | 300
2012 | white | west | 400
2012 | white | east | 400
2012 | white | south | 400
2010 | black | south | 150
2010 | black | east | 150
2010 | black | west | 150
2011 | black | east | 160
2011 | black | south | 160
2011 | black | west | 160
2012 | black | west | 170
2012 | black | east | 170
2012 | black | south | 170
What I'd like to get is data that looks like this:
year | race | sum(population)
-----------------------------
2010 | hispanic | 300
2010 | white | 600
2010 | black | 450