improvements / alternatives to clustering

Question

improvements / alternatives to clustering

Opened this issue 2 years ago · 1 comments

-- could also allow for custom initialized cluster centroids
-- allow for clustering based on cosine-similarity thresholds, to the centroid, or to the closest member of the cluster.
-- replace the arora et al embeddings with S-BERT embeddings
-- allow for stretching the space along an antonyms dimension
-- drop all names as stopwords
-- drop patients that contain a verb
-- make clustering on the list of entity phrases, rather than the set, an option. that is, add sample_weight=n_mentions to the k-means .fit() function. could also weight by log of n_mentions.

Answer 1 · 2022-03-24T13:00:39.000Z

another possible approach: https://towardsdatascience.com/clustering-sentence-embeddings-to-identify-intents-in-short-text-48d22d3bf02e