[Ring Dust] Possible logic error in Simple Windowing query
Closed this issue · 3 comments
I'm using the ebook version so page numbers are probably not very useful but this is near the end of the A Simple Window Function section of the Ring Dust chapter
In the following query:
select
distinct(teams.description) as team,
100.0 *
(
(count(1) over (partition by teams.description))::numeric
/ (count(1) over ())::numeric
) as percent_of_mission
from events
inner join teams on teams.id = target_id
order by percent_of_mission desc;
The inner join is joining teams.id
on target_id
which I think results in nonsense?
This tripped me up because I like to type along to try and absorb the material and, using the previous query as a template I wrote the inner join as inner join teams on teams.id = team_id
without thinking too much about it and found my results very different from the book!
The results, incidentally, look a lot more fairly distributed than ~60% of all events being radar ones:
team | percent_of_mission
-------+--------------------------
CIRS | 19.344463659431415965000
UVIS | 14.882097199101385095000
ISS | 14.528146364326927739000
VIMS | 10.304979554894703667000
INMS | 7.672167181161411278000
CDA | 7.216394873369644271000
RPWS | 6.789714415011394308000
CAPS | 5.432094774780598969000
MAG | 5.049052090572624570000
MIMI | 4.512469089909976888000
RSS | 2.044510529633281076000
RADAR | 1.323679149225025455000
MP | 0.885685193864852197000
PROBE | 0.014545924716758521000
Unrelated to this particular issue but I'm having a very hard time understanding exactly what window functions do!
Yikes! Nice catch and thanks!
Window functions let you avoid group bys. They're really hard to grasp until you start replacing a group by and then their utility becomes apparent.
The partition
bits take the place of group by
. In a traditional query, grouping comes at the end and applies to the whole query. With window functions, you get a little "grouping window" to rollup on, so you could "partition by category_id" in one line and then "partition by all" in the next line.
In the query you wrote, that's sort of what I did: rolled up based on team partitions and then rolled up over the whole thing. This gave me a percentage.
Fixed for next rev - thank you!