red-4/curious-moon

[Ring Dust] Possible logic error in Simple Windowing query

Closed this issue · 3 comments

I'm using the ebook version so page numbers are probably not very useful but this is near the end of the A Simple Window Function section of the Ring Dust chapter

In the following query:

select
  distinct(teams.description) as team,
  100.0 *
    (
      (count(1) over (partition by teams.description))::numeric
      / (count(1) over ())::numeric
    ) as percent_of_mission
from events
inner join teams on teams.id = target_id
order by percent_of_mission desc;

The inner join is joining teams.id on target_id which I think results in nonsense?

This tripped me up because I like to type along to try and absorb the material and, using the previous query as a template I wrote the inner join as inner join teams on teams.id = team_id without thinking too much about it and found my results very different from the book!

The results, incidentally, look a lot more fairly distributed than ~60% of all events being radar ones:

 team  |    percent_of_mission
-------+--------------------------
 CIRS  | 19.344463659431415965000
 UVIS  | 14.882097199101385095000
 ISS   | 14.528146364326927739000
 VIMS  | 10.304979554894703667000
 INMS  |  7.672167181161411278000
 CDA   |  7.216394873369644271000
 RPWS  |  6.789714415011394308000
 CAPS  |  5.432094774780598969000
 MAG   |  5.049052090572624570000
 MIMI  |  4.512469089909976888000
 RSS   |  2.044510529633281076000
 RADAR |  1.323679149225025455000
 MP    |  0.885685193864852197000
 PROBE |  0.014545924716758521000

Unrelated to this particular issue but I'm having a very hard time understanding exactly what window functions do!

Yikes! Nice catch and thanks!

Window functions let you avoid group bys. They're really hard to grasp until you start replacing a group by and then their utility becomes apparent.

The partition bits take the place of group by. In a traditional query, grouping comes at the end and applies to the whole query. With window functions, you get a little "grouping window" to rollup on, so you could "partition by category_id" in one line and then "partition by all" in the next line.

In the query you wrote, that's sort of what I did: rolled up based on team partitions and then rolled up over the whole thing. This gave me a percentage.

Fixed for next rev - thank you!