enjalot/latent-scope

Improved ingest flow

enjalot opened this issue · 1 comments

Right now when you upload a file to start a new dataset we just blindly ingest it. There are a number of things we could do better:

  • Check if a dataset with the same name exists
    • if it does, append a number like we do for other data files {dataset_name}_001
  • Check columns of the dataset to suggest options:
    • choose a text column
    • import embeddings (if column is all arrays of the same length)
      • give the option to choose the model embedding was generated with
      • this could be suggested in the embedding step based on columns identified as arrays of numbers
    • determine categorical and numeric columns
      • this would support filtering
      • and allowing to color by field

Implemented in 0.1.8