Video tutorial repository: "lpa_enum"

Latent Profile Analysis

IMMERSE Project: Institute of Mixture Modeling for Equity-Oriented Researchers, Scholars, and Educators

Video Series Funded by IES Training Grant (R305B220021)

Dina Arch

IMMERSE Project

The Institute of Mixture Modeling for Equity-Oriented Researchers, Scholars, and Educators (IMMERSE) is an IES funded training grant (R305B220021) to support Education scholars in integrating mixture modeling into their research.

Please visit our website to learn more and apply for the year-long fellowship.
Follow us on Twitter!

How to reference this walkthrough: This work was supported by the IMMERSE Project (IES - 305B220021)

Visit our GitHub account to download the materials needed for this walkthrough.

Example: PISA Student Data

The first example closely follows the vignette used to demonstrate the tidyLPA package (Rosenberg, 2019).

This model utilizes the PISA data collected in the U.S. in 2015. To learn more about this data see here.
To access the 2015 US PISA data & documentation in R use the following code:

devtools::install_github("jrosen48/pisaUSA15")
library(pisaUSA15)

Latent Profile Models:

model 1 Class-invariant / Diagonal: Equal variances, and covariances fixed to 0
model 2 Class-varying / Diagonal: Free variances and covariances fixed to 0
model 3 Class-invariant / Non-Diagonal: Equal variances and equal covariances
model 4 Free variances, and equal covariances
model 5 Equal variances, and free covariances
model 6 Class Varying / Non-Diagonal: Free variances and free covariances

Load packages

library(naniar)
library(tidyverse)
library(haven)
library(glue)
library(MplusAutomation)
library(here)
library(janitor)
library(gt)
library(tidyLPA)
library(pisaUSA15)
library(cowplot)
library(filesstrings)
here::i_am("lpa.Rmd")

Prepare Data


pisa <- pisaUSA15[1:500,] %>%
  dplyr::select(broad_interest, enjoyment, instrumental_mot, self_efficacy)

Descriptive Statistics

ds <- pisa %>% 
  pivot_longer(broad_interest:self_efficacy, names_to = "variable") %>% 
  group_by(variable) %>% 
  summarise(mean = mean(value, na.rm = TRUE),
            sd = sd(value, na.rm = TRUE)) 

ds %>% 
  gt () %>% 
  tab_header(title = md("**Descriptive Summary**")) %>%
  cols_label(
    variable = "Variable",
    mean = md("M"),
    sd = md("SD")
  ) %>%
  fmt_number(c(2:3),
             decimals = 2) %>% 
  cols_align(
    align = "center",
    columns = mean
  )

Enumeration

`tidyLPA`

Enumerate using estimate_profiles():

Estimate models with classes $K = 1:4$
Model has 4 continuous indicators
Default variance-covariance specifications (model 1)
Change variances and covariances to indicate the model you want to specify (see Vignette)


# Run LPA models 
pisa %>% 
    estimate_profiles(1:4,
                      package = "MplusAutomation",
                      ANALYSIS = "starts = 100, 20;",
                      variances = c("equal", "varying", "equal", "varying"),
                      covariances = c("zero", "zero", "equal", "equal"),
                      keepfiles = TRUE)

# Move files to folder 
files <- list.files(here(), pattern = "^model")
file.move(files, here("tidyLPA"))

Table of Fit

APA formatted model fit table with additional fit indices

Extract data:

output_pisa <- readModels(here("tidyLPA"), quiet = TRUE)

enum_extract <- LatexSummaryTable(
  output_pisa,
  keepCols = c(
    "Title",
    "Parameters",
    "LL",
    "BIC",
    "aBIC",
    "BLRT_PValue",
    "Observations"
  )
)

allFit <- enum_extract %>%
  mutate(aBIC = -2 * LL + Parameters * log((Observations + 2) / 24)) %>%
  mutate(CAIC = -2 * LL + Parameters * (log(Observations) + 1)) %>%
  mutate(AWE = -2 * LL + 2 * Parameters * (log(Observations) + 1.5)) %>%
  separate(Title, c("Model", "Class"), sep = "with") %>% 
  mutate(SIC = -.5 * BIC) %>%
  drop_na(SIC) %>% 
  group_by(Model) %>% 
  mutate(expSIC = exp(SIC - max(SIC))) %>%
  mutate(BF = exp(SIC - lead(SIC))) %>%
  mutate(cmPk = expSIC / sum(expSIC)) %>%
  ungroup() %>% 
  unite(Title, c("Model", "Class"), sep = "with", remove = TRUE) %>% 
  dplyr::select(1:5, 8:9, 6, 12, 13) %>%
  mutate(Title = str_to_title(Title)) %>% 
  arrange(Title)

Create table:

allFit %>%
  gt() %>%
  tab_header(title = md("**Model Fit Summary Table**")) %>%
  cols_label(
    Title = "Classes",
    Parameters = md("Par"),
    LL = md("*LL*"),
    BLRT_PValue = "BLRT",
    BF = md("BF"),
    cmPk = md("*cmPk*")
  ) %>%
  tab_footnote(
    footnote = md(
      "*Note.* Par = Parameters; *LL* = model log likelihood;
BIC = Bayesian information criterion;
aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion;
AWE = approximate weight of evidence criterion;
BLRT = bootstrapped likelihood ratio test p-value;
VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value;
*cmPk* = approximate correct model probability."
    ),
locations = cells_title()
  )  %>%
  tab_options(column_labels.font.weight = "bold") %>%
  fmt_number(
    9,
    decimals = 2,
    drop_trailing_zeros = TRUE,
    suffixing = TRUE
  ) %>%
  fmt_number(c(3:8, 10),
             decimals = 0) %>%
  sub_missing(1:10,
              missing_text = "--") %>%
  fmt(
    c(8, 10),
    fns = function(x)
      ifelse(x < 0.001, "<0.001",
             scales::number(x, accuracy = 0.01))
  ) %>%
  fmt(
    9,
    fns = function (x)
      ifelse(x > 100, ">100",
             scales::number(x, accuracy = .1))
  ) %>% 
  tab_row_group(
    label = "Model 1",
    rows = c(1:4)) %>%
  tab_row_group(
    label = "Model 2",
    rows =c(5:8)) %>%    
  tab_row_group(
    label = "Model 3",
    rows = c(9:12)) %>%
  tab_row_group(
    label = "Model 4",
    rows = c(13:16)) %>%    
  row_group_order(
    groups = c("Model 1","Model 2", "Model 3", "Model 4")
  )

Information Criteria Plot

Plot information criteria

allFit %>% 
  filter(grepl("Model 1", Title)) %>% 
  dplyr::select(2:7) %>% 
  rowid_to_column() %>% 
  pivot_longer(`BIC`:`AWE`,
               names_to = "Index",
               values_to = "ic_value") %>%  
mutate(Index = factor(Index,
                      levels = c ("AWE", "CAIC", "BIC", "aBIC"))) %>% 
  ggplot(aes(x = rowid, y = ic_value,
             color = Index, shape = Index,
             group = Index, lty = Index)) +
  geom_point(size = 2.0) + geom_line(size = .8) +
  scale_x_continuous(breaks = 1:6) +
  scale_colour_grey(end = .5) +
  theme_cowplot() +
  labs(x = "Number of Classes", y = "Information Criteria Value") +
  theme(legend.title = element_blank(),
        legend.position = "top")

Compare models

# MplusAutomation Method using `compareModels` 

parallelModels <- readModels(here("tidyLPA"))

compareModels(parallelModels[["model_3_class_2.out"]],
  parallelModels[["model_4_class_2.out"]], diffTest = TRUE)

Latent Profile Plot

source("plot_lpa_function.txt")

plot_lpa_function(model_name = output_pisa$model_1_class_2.out)

save figure

ggsave(here("figures", "C4_LPA_Plot.png"), dpi="retina", height=5, width=7, units="in")

References

Hallquist, M. N., & Wiley, J. F. (2018). MplusAutomation: An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus. Structural equation modeling: a multidisciplinary journal, 25(4), 621-638.

Miller, J. D., Hoffer, T., Suchner, R., Brown, K., & Nelson, C. (1992). LSAY codebook. Northern Illinois University.

Muthén, B. O., Muthén, L. K., & Asparouhov, T. (2017). Regression and mediation analysis using Mplus. Los Angeles, CA: Muthén & Muthén.

Muthén, L.K. and Muthén, B.O. (1998-2017). Mplus User’s Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén

Rosenberg, J. M., van Lissa, C. J., Beymer, P. N., Anderson, D. J., Schell, M. J. & Schmidt, J. A. (2019). tidyLPA: Easily carry out Latent Profile Analysis (LPA) using open-source or commercial software [R package]. https://data-edu.github.io/tidyLPA/

R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/

Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686

immerse-ucsb/lpa_enum