vi3k6i5/GuidedLDA

Seed Confidence

browningbarrett opened this issue · 0 comments

Hello, could you explain a bit more about the way the seed_confidence parameter works?

I've been measuring convergence on a large corpus (public company earnings calls) by ranking likelihood and assigning points to topics where the seeded words are more likely to be in their seeded topic. As I tested different seed_confidence values I realized that the lower values were returning better convergence scores, which isn't what I expected.

Here's where the seed_confidence parameter is implemented:
if w in seed_topics and random.random() < seed_confidence:
z_new = seed_topics[w]
else:
z_new = i % n_topics

If I understand this correctly then a seed_confidence value of 1 should assign seed words to the seeded topic every time. A value of 0 would make every seed word randomly assigned. So am I getting better convergence with no seeding? Or do I not understand how the seed_confidence parameter works?