cylnlp/dialogsum

questions about the topics

Charles-ux-bit opened this issue · 1 comments

I find that in appendix A, the topics in the list are quite different from those in the dataset.
Actually, each topic in the dataset seems to be almost unique.
How can I map them into a limited label set? Thanks.

@doggyChe - Appendix A in the DialogSum paper shows the clustering result. What you find in Table 12 is the label of each cluster, which is assigned by human.

Table 12 presents the cluster topics with corresponding id, which is assigned by human.

If you want to group topic information in the DialogSum dataset, you can try:

  1. map the topic phrase into embeddings (e.g., glove or using pretrained language models).
  2. use cluster technologies to group those embeddings (e.g., K-means or hierarchical cluster).

Hope this helps.