bstewart/stm

stm/prevalence issue

yuanyuan0105 opened this issue · 5 comments

I tried to run a stm function as below, but got an error message:
"Error in stm(documents = out$documents, vocab = out$vocab, K = 0, data = out$meta, : number of observations in content covariate (1) prevalence covariate (20263) and documents (20263) are not all equal."

the code I have is like this:

stmfit <- stm(documents = out$documents, vocab = out$vocab,
K = 0 ,data = out$meta, prevalence =~ timenum,
max.em.its = 75,seed=24601,
init.type = "Spectral", verbose = FALSE,
control <- list(tSNE_init.dims=80))

I did not specify "content =" argument in my code as I see some examples only have "prevalence" as well.
So I would like to know what causes this error and how to solve it?

Many thanks

Hi @santoroma,

I attached the dataset and my code below

https://docs.google.com/spreadsheets/d/1eStIhewnnMxmYG0MEDgYz3euJThRsjPV4YELpldatlk/edit?usp=sharing

library(stm)
processed <- textProcessor(data_english$text, metadata = data_english)
out <- prepDocuments(processed$documents, processed$vocab, processed$meta)
First_STM <- stm(documents = out$documents, vocab = out$vocab,
K = 0,data = out$meta, prevalence =~ s(timenum),
init.type = "Spectral", verbose = FALSE,
control <- list(tSNE_init.dims=80))

Thanks much for your help in advance!

It's very likely that you got missings in your covariates. STM currently cannot handle missing values: "6Note that the model does not permit estimation when there are variables used in the model that have missing values. As such, it can be helpful to subset data to observations that do not have missing values for metadata that will be used in the STM model."

Roberts, M. E., Stewart, B. M. & Tingley, D. (2019). stm: An R Package for Structural Topic Models. Journal of Statistical Software, 91, 1–40. https://doi.org/10.18637/jss.v091.i02

JvH13 commented

I am having the same issue. I followed the instructions in #144, but I don't have missing values. What puzzles me is why it throws an error about the content covariate (1), while I do not have a content covariate in my model. The prevalence covariate and document covariate have equal lengths and no missing values.

I am having the same issue. I followed the instructions in #144, but I don't have missing values. What puzzles me is why it throws an error about the content covariate (1), while I do not have a content covariate in my model. The prevalence covariate and document covariate have equal lengths and no missing values.

I am having the same issue.