supplied vector `brk` does not reflect the bounds of each bin in the histogram
Opened this issue · 0 comments
Course: Exploratory Data Analysis
Lesson: GGPlot2 Extras
Progress: 17%
Issue: Actually, I don't see how the vector counts
matches the height of each bin in qplot(price, data = diamonds, binwidth = 18497/30)
.
> qplot(price, data = diamonds, binwidth = 18497/30)
| Your dedication is inspiring!
|======= | 17%
| No more messages in red, but a histogram almost
| identical to the previous one! If you typed
| 18497/30 at the command line you would get the
| result 616.5667. This means that the height of
| each bin tells you how many diamonds have a price
| between x and x+617 where x is the left edge of
| the bin.
...
|======== | 19%
| We've created a vector containing integers that
| are multiples of 617 for you. It's called brk.
| Look at it now.
> brk
[1] 0 617 1234 1851 2468 3085 3702 4319
[9] 4936 5553 6170 6787 7404 8021 8638 9255
[17] 9872 10489 11106 11723 12340 12957 13574 14191
[25] 14808 15425 16042 16659 17276 17893 18510 19127
| You are amazing!
|========= | 20%
| We've also created a vector containing the number
| of diamonds with prices between each pair of
| adjacent entries of brk. For instance, the first
| count is the number of diamonds with prices
| between 0 and $617, and the second is the number
| of diamonds with prices between $617 and $1234.
| Look at the vector named counts now.
> counts
[1] 4611 13255 5230 4262 3362 2567 2831 2841
[9] 2203 1666 1445 1112 987 766 796 655
[17] 606 553 540 427 429 376 348 338
[25] 298 305 269 287 227 251 97
| Your dedication is inspiring!
|========= | 22%
| See how it matches the histogram you just
| plotted? So, qplot really works!
Some conflicting observations:
- counts[2] contains the largest value; however, the plot shows the first bin should be the largest.
- The plot shows bins 1 through 6 decreasing in value with bin 7 greater than the two preceding bins, followed by bin 8 decreasing from bin 7 but still greater than bin 6. The counts vector, if we ignore the 1st element, decreases from element 2 through 5, value 6 increases but only surpasses the previous value, and value 7 increases over value 6.
This vector does not match the histogram.
I agree with the statement
...the height of each bin tells you how many diamonds have a price between x and x+617 where x is the left edge of the bin.
But the values in brk
do not reflect this.
Using the statement with values inserted, the first bin should be between 326 and 326+617 where 326 is the left edge of the first bin. Not between 0 and 617 as indicated by brk
.
Using an offset of +326 to the values of brk
, we get the following values for counts2
which I feel better represent the plot.
> counts2 <- numeric(30)
> for (i in seq_along(1:30)) {
+ counts2[i] <- nrow(diamonds[diamonds$price >= 326+617*(i-1) & diamonds$price < 326+617*i, ])
+ }
> counts2
[1] 13308 6820 5214 3853 2933 2540 3021 2552 1818 1540 1264
[12] 1085 829 817 711 613 573 559 455 433 418 367
[23] 343 288 287 314 260 269 242 214
Regards,
Steve