Confused by HDBscan in UI
Opened this issue · 2 comments
Hi - love the project concept. I wanted to use clustering to help label/understand my OCR data, and I don't get the UI around HDBScan. I calculated the embedding in a python notebook and it says it registered it, but the UI panel for HDBScan wants me to input text (by hand?) and doesn't seem to see my already created embeddings etc. Do I misunderstand the goal here?
Thanks!
Lynn
Hey Lynn!
We haven't written the guide for this yet but here's what you can do:
From the datasets page, make sure you've computed an embedding on the field you want to cluster (open the schema to do this, hamburger on the field, compute embedding)
Then, open the schema view, click the hamburger icon, and click compute signal, then choose hdbscan with your embedding.
Once you do that it will schedule a task and take some time. Once complete it will generate a new column with cluster ids.
A good way to view clusters is to click the "group by" in the dataset view, and choose the cluster id. Then you can arrow one by one through clusters.
Hope this helps!
Super, thanks!