/Back_of_the_Envelope

✉️ can we build a basic point-and-click regression analysis tool with R shiny? 📉

Primary LanguageR

🖂 Back of the Envelope 📉

https://acm9q.shinyapps.io/Back_of_the_Envelope/

Can we build a basic point-and-click regression analysis tool replacement to general purpose statistical analysis tools with R shiny?

The goal:

Build a general purpose regression tool, incorporating extensions to regression such as heteroskedasticity-robust standard errors, clustered standard errors, multilevel modeling, and logistic regression.

With sufficient on-your-own data preparation, this tool should be sufficient for basic regression analyses for beginner-to-intermediate level social-science use. I have no plans to implement latent-variable / structural equation modeling at this time, but path analysis could be in the distant future.

To Do Items

1.0 To Do List:

  • Output should include SjPlot's tab_model() output for APA-style regression tables
    • there be dragons with clustered standard errors
  • ggplot2 representations of the model
    • bivariate
    • bivariate residual plot
    • two independent variables (close! close? finished!)
    • added variable plots Take that, math!
    • Plot residuals. (it's just [predicted v actual] + [residual v fitted])
      • when using {estimatr}, can do this:
  • Robust Specification
  • Logistic Regression up and running
  • Multivariate Regression Up and running
  • clusters:
    • fixed effects (just make it add factor() of whatever variable to the data, then update the model to include this)
    • standard errors (this will make error for SjPlot see his tweet reply on this topic)
  • margins plots
  • Outlier Analysis:
  • rearrange upload / model page (incorporates text of current model)
    • but can we prettify this?
  • sassy message when they ask for logistic regression residuals.
  • Model Diagnostics
    • lindia is being a jerk. maybe try ggfortify or gglm
      • {ggfortify} can't handle these either.
      • {gglm} only handles classes 'lm' and 'glm' wow even fewer.
    • QQ plot
    • resid v fitted
    • histogram of residuals
  • fix issue with missing points whenever there are residuals? what's that about??
  • overhaul the plot outputs so it's just:
    • original plot (follows programmatically based on variables) with second tab for residuals
      • set "1 IV" and "2 IV" plots to be a logical when length(indevars) == 1{} else if length(indevars == 2{} else NULL
    • marginal effects plot
    • Added Variable Plots. Make sure to deal with issue of missing data with AV plots (and residuals above for that matter)
  • estimatr redo of all models, including a fixed-effects absorption.
    • triple-check that cluster standard errors and robust standard errors are properly specified.
    • [] nope you idiot. you did & instead of |
    • HUGE DOWNSTREAM EFFECTS ON MODEL DIAGNOSTICS AND AVPLOTS
  • Available Models Matrix THIS IS NOW PRIORITY #1 AS IT WILL TRACK PROGRESS ON EVERYTHING ELSE
  • fix downstream issues from lm_robust()
    • av plots
    • residuals
    • model diagnostics
    • don't forget to adjust geom_smooth(method = "lm_robust") in the function call if input$rbst == TRUE
  • model summary extra tab for results as ANOVA (no package has good output of ANOVA table to HTML for a REASON)
  • MODULAR OVERHAUL (https://rviews.rstudio.com/2021/10/20/a-beginner-s-guide-to-shiny-modules/)
    • This should make every individual tab its own module for simplicity's sake on the main page, which is like 800 lines now jeez.
  • solidify color theme
  • purchase logo design
  • Publish

Current Problem:

Honestly

The key may to take all the model output objects that are generated and to standardize their output into my own bespoke formatting, and then render that into everything else... But it sure does sound like a pain.

Status of what doesn't work:

The Plan:

1.1 To Do List

#ui

radioButtons('format', h5('Document format'), c('PDF', 'HTML', 'Word'), inline = TRUE),
downloadButton('downloadReport'),

#server
   output$downloadReport <- downloadHandler(
      filename = function() {
         paste('my-report', sep = '.', switch(
            input$format, PDF = 'pdf', HTML = 'html', Word = 'docx'
         ))
      },
      content = function(file) {
         src <- normalizePath('report.Rmd')
         owd <- setwd(tempdir())
         on.exit(setwd(owd))
         file.copy(src, 'report.Rmd')
         
         library(rmarkdown)
         out <- render('report.Rmd', switch(
            input$format,
            PDF = pdf_document(), HTML = html_document(), Word = word_document()
         ))
         file.rename(out, file)
      })

Downloadable plots:

Example:

plot_server <- function(id, df, vbl, threshhold = NULL) {
  
  moduleServer(id, function(input, output, session) {
    
    plot <- reactive({viz_monthly(df(), vbl, threshhold)})
    output$plot <- renderPlot({plot()})
    output$dnld <- downloadHandler(
      filename = function() {paste0(vbl, '.png')},
      content = function(file) {ggsave(file, plot())}
    )
    
  })
}

1.2 To Do List

The Deep Future To Do List

  • Binary Outcome Mixed Effects
  • Multiple Imputation with Chained Equations
  • specify which variable is the ID variable; then allow users to plot Id variables instead of points.

User Feedback:

  • For the Correlation table, you may want to rotate your x-axis labels 45 or 90 degrees. Getting a lot of overlap for files with > 20 factors
  • Is there a way that you can override or modify the error messages? Instead of "contact the app author", maybe provide a URL to a message board or email?
    • MAKE ERRORS GREAT AGAIN
  • my big suggestion is just to clarify and restructure the flow of the user interface. I think my user preference is that I'd want to upload a dataset, look around in it, and then decide on a model
    • so maybe separate the Upload and Model pieces entirely. Then restructure the left-hand nav to be something like Upload > View Data Set > Descriptive Statistics > Correlation Table > Model > Summary > Plots > Diagnostics
    • some suggestions in there to make the nav header more descriptive, and put summary before plots just so we get the immediate output of the model. with this, might even be worthwhile to put the model + summary on the same page actually
  • my last thought would be that it might be cool to allow for dplyr-style filtering of the uploaded dataset - I think it would be relatively straightforward, but also legit if you don't want to include that functionality as it could also be an enormous pain in the ass to try and catch edge-cases
    • (in re dplyr style filtering: editable data tables are now possible, but that's a feature I have planned to work on after I squash all the inital bugs. it's gonna go: squash bugs, added variable plots, finish outliers, HLM, instrumental variables, THEN data processing)
  • ooh, before I forget: I might also have a disclaimer or something re: what you do with the uploaded datasets. could spook some people when you actually have people using it for not just testing purposes

With a sufficient amount of effort, this could actually be a pretty good tool for multilevel linear modeling, provided I can figure out what exactly the error was with mice's MCMC imputation. imputation is gonna have to wait for v1.2 at the absolute earliest.

Domain issues

Better names:

www.backoftheenvelope.com (is currently taken) www.envelope.fyi www.envelo.pe

material:

Reading Material

Watching Material

https://rstudio.com/resources/webinars/testing-shiny-applications-with-shinytest-shiny-developers-now-have-tools-for-automated-testing-of-complete-applications/ https://rstudio.com/resources/webinars/introducing-shiny-gadgets-interactive-tools/ https://rstudio.com/resources/webinars/interactive-graphics-with-shiny/ https://rstudio.com/resources/webinars/help-me-help-you-creating-reproducible-examples/ https://rstudio.com/resources/webinars/scaling-shiny-apps-with-asynchronous-programming/

it's been done:

but not with user-input data: https://rich.shinyapps.io/regression/ (use this for model work) bruh use this for downloadable reports work