MaartenGr/BERTopic

Can I merge_topics() after reduce_outliers() and update_topics()

Closed this issue · 4 comments

Thank you so much for the great job.

In my case, there are too many outliers here. I want to reduce the outliers, but at the same time I need to merge topics.

My question is, can I merge topics after reducing the outliers and updating the topics. The code is below:

new_topics = topic_model.reduce_outliers(docs, topics)
topic_model.update_topics(docs, topics=new_topics)
topic_model.merge_topics(docs, [...])
topic_info = topic_model.get_topic_info()

I see this warning in the official documentation
"In both cases, it is important to realize that updating the topics this way may lead to errors if topic reduction or topic merging techniques The reason for this is that when you assign a -1 document to topic 1 and another -1 document to topic 2, it is unclear how you map the -1 documents. Is it matched to topic 1 or 2."

It looks like topics should not be merged after reducing outliers and updating topics. But executing the code above doesn't seem to report an error.

What should I do to achieve my goal? For example, should I merge the topics first, then reduce the outliers, then update the topics? Is it right to put updating topics at the end?

topic_model.merge_topics(docs, [...]
new_topics = topic_model.reduce_outliers(docs, topics)
topic_model.update_topics(docs, topics=new_topics)

Thank you very much.

Reducing outliers indeed makes mapping topics quite a bit more complex if certain mappings are being created. In practice, this might not always lead to issues but it is advised to have outlier reduction to be the last step in the pipeline due to this mapping issue.

Thank you very much, I think putting outlier reduction at the end is a good choice. But I'm also a bit curious, if I reduce the outliers first, then update the topic, and finally merge the topics, and if no errors are reported during the whole process, does that mean that the operation is fine. Or does it mean that something may have gone wrong, but it won't throw an error.
Thanks again for your valuable suggestions!

If no errors are given, then it is generally fine but I cannot be entirely sure so you would have to be careful when you continue updating the underlying model.

thanks, Your instructions are very useful.