huggingface/OBELICS

How to use LDA for topic modeling

jrryzh opened this issue · 1 comments

Thanks for your work again!
In the paper the topic modeling of OBELICS is implemented using LDA, and I am wondering what is the specific LDA model was used, what setting was used to train the model, and most importantly, how the topic was derived from the key words and weights(like using LLMs)? Thank you for answering!

We used this implementation https://mimno.github.io/Mallet/topics.
I don't remember the parameters but it should be the default ones.
Yes we used ChatGPT to generate the topic from the key words!