stm/prevalence issue

Question

stm/prevalence issue

yuanyuan0105 opened this issue 2 years ago · 5 comments

I tried to run a stm function as below, but got an error message:
"Error in stm(documents = out$documents, vocab = out$vocab, K = 0, data = out$meta, : number of observations in content covariate (1) prevalence covariate (20263) and documents (20263) are not all equal."

the code I have is like this:

stmfit <- stm(documents = out$documents, vocab = out$vocab,
K = 0 ,data = out$meta, prevalence =~ timenum,
max.em.its = 75,seed=24601,
init.type = "Spectral", verbose = FALSE,
control <- list(tSNE_init.dims=80))

I did not specify "content =" argument in my code as I see some examples only have "prevalence" as well.
So I would like to know what causes this error and how to solve it?

Many thanks

Answer 1 · 2022-05-19T08:47:47.000Z

Hello yuanyuan0105 Can you post a reproducible example of your code and data? *------------------------------------------------------------* *Mario Santoro* *Mobile: +393286654333* *Email: ***@***.*** ***@***.***>* Vizualize.me <http://vizualize.me/santoro.ma#> Il giorno gio 19 mag 2022 alle ore 06:54 yuanyuan0105 < ***@***.***> ha scritto:

…

I tried to run a stm function as below, but got an error message: "Error in stm(documents = out$documents, vocab = out$vocab, K = 0, data = out$meta, : number of observations in content covariate (1) prevalence covariate (20263) and documents (20263) are not all equal." the code I have is like this: stmfit <- stm(documents = out$documents, vocab = out$vocab, K = 0 ,data = out$meta, prevalence =~ timenum, max.em.its = 75,seed=24601, init.type = "Spectral", verbose = FALSE, control <- list(tSNE_init.dims=80)) I did not specify "content =" argument in my code as I see some examples only have "prevalence" as well. So I would like to know what causes this error and how to solve it? Many thanks — Reply to this email directly, view it on GitHub <#272>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACGE334EGEUJSWYWMJ6Z7V3VKXCQ7ANCNFSM5WK5C44Q> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 2 · 2022-05-20T02:19:29.000Z

Hi @santoroma,

I attached the dataset and my code below

https://docs.google.com/spreadsheets/d/1eStIhewnnMxmYG0MEDgYz3euJThRsjPV4YELpldatlk/edit?usp=sharing

library(stm)
processed <- textProcessor(data_english$text, metadata = data_english)
out <- prepDocuments(processed$documents, processed$vocab, processed$meta)
First_STM <- stm(documents = out$documents, vocab = out$vocab,
K = 0,data = out$meta, prevalence =~ s(timenum),
init.type = "Spectral", verbose = FALSE,
control <- list(tSNE_init.dims=80))

Thanks much for your help in advance!

Answer 3 · 2022-08-25T13:35:01.000Z

It's very likely that you got missings in your covariates. STM currently cannot handle missing values: "6Note that the model does not permit estimation when there are variables used in the model that have missing values. As such, it can be helpful to subset data to observations that do not have missing values for metadata that will be used in the STM model."

Roberts, M. E., Stewart, B. M. & Tingley, D. (2019). stm: An R Package for Structural Topic Models. Journal of Statistical Software, 91, 1–40. https://doi.org/10.18637/jss.v091.i02

Answer 4 · 2023-06-06T13:30:02.000Z

I am having the same issue. I followed the instructions in #144, but I don't have missing values. What puzzles me is why it throws an error about the content covariate (1), while I do not have a content covariate in my model. The prevalence covariate and document covariate have equal lengths and no missing values.

Answer 5 · 2024-01-31T18:03:11.000Z

I am having the same issue. I followed the instructions in #144, but I don't have missing values. What puzzles me is why it throws an error about the content covariate (1), while I do not have a content covariate in my model. The prevalence covariate and document covariate have equal lengths and no missing values.

I am having the same issue.