co-branded cars mean some duplicates within the cars_data
Opened this issue · 1 comments
Hi,
This code snippet shows that there are several co-branded cars (GMC/Chevy)/(Ford/Lincoln)/(GMC/Cadillac) exist within the current car_data.
Not sure if you're planning on using this data for the book, but thought I would point it out. This code snippet finds examples of it. Caveat!! We simply can't use this code to exclude observations, but it at least gets us a list to review:
car_train %>% group_by(mpg, model_year) %>% filter(n()>1) %>% arrange(mpg) %>% View
Thanks a bunch for the course and really interesting information you presented!
Tony
Some of these got by since I looked for unique combinations of these four variables:
mpg = comb_unadj_fe___conventional_fuel,
mpg_city_un = city_unrd_adj_fe___conventional_fuel,
mpg_hwy_un = hwy_unrd_adj_fe___conventional_fuel,
mpg_comb = comb_unrd_adj_fe___conventional_fuel,
Your flag catches 465 cars whereas the same group_by
using all four catches 321 cars (which is still bad).
I guess my inclination is the use your filter since they are effectively the same. This generates:
> filtered <-
+ car_data %>%
+ group_by(mpg, model_year, cylinders, gears, aspiration) %>%
+ slice(1) %>%
+ arrange(model_year, division, carline)
>
> table(car_data$model_year)
2015 2016 2017 2018
1024 646 1015 609
>
> table(filtered$model_year)
2015 2016 2017 2018
955 606 949 558