enjalot/latent-scope

1-step Setup option

enjalot opened this issue · 1 comments

There should be an option to setup a scope with 1 click that automatically takes you through the steps instead of asking you to run each step manually.

This should help people onboard and see their data quickly with sane defaults. Then if you want to improve the results you could go back and make different choices (like different embedding model, or different UMAP parameters).

The defaults should probably be something like:

  1. embed with an OSS model (jina small?)
  2. umap with default parameters (adjust neighbors based on size of data)
  3. cluster with default paramters (also adjusted based on size of data)
  4. default labels, don't auto summarize here (can take very long, best results with proprietary models right now)

This would hopefully make getting started with latent scope much easier and ease people into exploring the various capabilities.

More UX thoughts:

  • if you already had embeddings imported, default to one of those #34
  • embedding model is still surfaced as a dropdown
  • prompt to set OpenAI key (or other providers) if not set
  • don't show umap or cluster options
  • count tokens to give cost estimate for labeling clusters (even if just in tokens)

Process thoughts:

  • an orchestrator function (ls-orchestrate?) spawns jobs for each step of the process using subprocess
  • this way job history still generated
  • and progress can be tracked for each step individually in the web ui
    • have a minimized job status tracker so its not overwhelming (but allow expanding to see cli output)