LucyMcGowan/tidycode

Feature request: proportion of an R file classified to different categories?

Closed this issue · 4 comments

Thanks for an amazing package. It is proving invaluable for a (very) nascent project in which my colleagues and I are trying to understand how beginning data scientists learn to visualize data.

One question: is there existing functionality - or would it be desirable to add functionality - for calculating the proportion of a total R file classified to different categories?

As I now write out an example, I wonder if this is trivial and something folks can just do; but also wonder if it would be helpful?

library(tidyverse)
library(tidycode)

d <- read_rfiles(
  tidycode_example("example_plot.R"),
  tidycode_example("example_analysis.R")
)

u <- unnest_calls(d, expr)

p <- u %>%
  dplyr::inner_join(
    get_classifications("crowdsource", include_duplicates = FALSE)
  ) %>%
  dplyr::anti_join(get_stopfuncs()) %>%
  dplyr::select(file, func, classification)
#> Joining, by = "func"
#> Joining, by = "func"

f <- function(d) {
  d %>% 
    count(file, classification) %>% 
    group_by(file) %>% 
    mutate(prop = n / sum(n))
}

f(p)
#> # A tibble: 7 x 4
#> # Groups:   file [2]
#>   file                                           classification     n  prop
#>   <chr>                                          <chr>          <int> <dbl>
#> 1 /Library/Frameworks/R.framework/Versions/3.6/… data cleaning      2 0.286
#> 2 /Library/Frameworks/R.framework/Versions/3.6/… exploratory        1 0.143
#> 3 /Library/Frameworks/R.framework/Versions/3.6/… setup              3 0.429
#> 4 /Library/Frameworks/R.framework/Versions/3.6/… visualization      1 0.143
#> 5 /Library/Frameworks/R.framework/Versions/3.6/… data cleaning      4 0.5  
#> 6 /Library/Frameworks/R.framework/Versions/3.6/… setup              1 0.125
#> 7 /Library/Frameworks/R.framework/Versions/3.6/… visualization      3 0.375

Created on 2019-11-22 by the reprex package (v0.3.0)

This is great! I'm so happy you're using the package. My instinct for this would be to include this workflow in an article to demonstrate how to do this, since the majority of the legwork here is being done by dplyr functions, rather than wrap it in a single function that we export. Would you be interested in writing up an article like that?

Sure, I'd be (more than) happy to! Will write up a draft this week. May circle in a collaborator on the above project, if okay with you - would that be alright?

Haven't forgotten about this (though, have become quite behind)! @acircleda, will work on a draft and then ping you with a request for feedback.