stm/prevalence issue
yuanyuan0105 opened this issue · 5 comments
I tried to run a stm function as below, but got an error message:
"Error in stm(documents = out$documents, vocab = out$vocab, K = 0, data = out$meta, : number of observations in content covariate (1) prevalence covariate (20263) and documents (20263) are not all equal."
the code I have is like this:
stmfit <- stm(documents = out$documents, vocab = out$vocab,
K = 0 ,data = out$meta, prevalence =~ timenum,
max.em.its = 75,seed=24601,
init.type = "Spectral", verbose = FALSE,
control <- list(tSNE_init.dims=80))
I did not specify "content =" argument in my code as I see some examples only have "prevalence" as well.
So I would like to know what causes this error and how to solve it?
Many thanks
Hi @santoroma,
I attached the dataset and my code below
https://docs.google.com/spreadsheets/d/1eStIhewnnMxmYG0MEDgYz3euJThRsjPV4YELpldatlk/edit?usp=sharing
library(stm)
processed <- textProcessor(data_english$text, metadata = data_english)
out <- prepDocuments(processed$documents, processed$vocab, processed$meta)
First_STM <- stm(documents = out$documents, vocab = out$vocab,
K = 0,data = out$meta, prevalence =~ s(timenum),
init.type = "Spectral", verbose = FALSE,
control <- list(tSNE_init.dims=80))
Thanks much for your help in advance!
It's very likely that you got missings in your covariates. STM currently cannot handle missing values: "6Note that the model does not permit estimation when there are variables used in the model that have missing values. As such, it can be helpful to subset data to observations that do not have missing values for metadata that will be used in the STM model."
Roberts, M. E., Stewart, B. M. & Tingley, D. (2019). stm: An R Package for Structural Topic Models. Journal of Statistical Software, 91, 1–40. https://doi.org/10.18637/jss.v091.i02
I am having the same issue. I followed the instructions in #144, but I don't have missing values. What puzzles me is why it throws an error about the content covariate (1), while I do not have a content covariate in my model. The prevalence covariate and document covariate have equal lengths and no missing values.
I am having the same issue. I followed the instructions in #144, but I don't have missing values. What puzzles me is why it throws an error about the content covariate (1), while I do not have a content covariate in my model. The prevalence covariate and document covariate have equal lengths and no missing values.
I am having the same issue.