bstewart/stm

Models converging after few iterations

juhopaak opened this issue · 0 comments

Hi, and thanks for an excellent package!

I'm trying to run the searchK function on a dataset of around 400k social media messages from various platforms (including short Twitter tweets but also longer discussion forum posts). I've tried to find the optimal model between the range k=10-300. However when k is close to or over 200, models begin converging after just a couple of iterations, which produces results that are suboptimal in comparison to models that run longer. I've tried using different random seeds for generating the heldout set, and this seems to influence the issue, i.e. under some random splits e.g. the k=200 model would converge in 3 iterations, whereas under others it would take over 200 iterations.

Would you have any idea what might be causing this issue, and whether this is appropriate model behavior? I'm trying to figure out how to assess the reliability of such results, possibly through e.g. doing a 10-fold validation with different random seeds.

Many thanks for help!