Improved ingest flow

Question

enjalot opened this issue 7 months ago · 1 comments

Right now when you upload a file to start a new dataset we just blindly ingest it. There are a number of things we could do better:

Check if a dataset with the same name exists
- if it does, append a number like we do for other data files {dataset_name}_001
Check columns of the dataset to suggest options:
- choose a text column
- import embeddings (if column is all arrays of the same length)
  - give the option to choose the model embedding was generated with
  - this could be suggested in the embedding step based on columns identified as arrays of numbers
- determine categorical and numeric columns
  - this would support filtering
  - and allowing to color by field

Answer 1 · 2024-03-21T19:11:28.000Z

Implemented in 0.1.8