MaartenGr/BERTopic

Skip topic representation when reducing topics wiht nr_topic parameter

Opened this issue · 1 comments

Hi,

I observed that when I reduce the topics by setting nr_topic = int, it runs topic representation before reducing the topics.

For example, if I have 1,800 topics and reduce (or merge) to 100 topics, it seems like it runs 1,800 representations and then re-runs 100 representations.

This is not efficient especially when I use the OpenAI API, since it uses API resources for what I don't need.

Is there any way to avoid the double representation by default? (Of course I can run the representation part after getting a model, but I'm just curious if that feature is implemented)

The way you can do this currently is by not using any representation model during .fit or .fit_transform and then afterwards create the representations using .update_topics. That way, it will only generate the topic representations after reducing topics.