Improved ingest flow
enjalot opened this issue · 1 comments
enjalot commented
Right now when you upload a file to start a new dataset we just blindly ingest it. There are a number of things we could do better:
- Check if a dataset with the same name exists
- if it does, append a number like we do for other data files {dataset_name}_001
- Check columns of the dataset to suggest options:
- choose a text column
- import embeddings (if column is all arrays of the same length)
- give the option to choose the model embedding was generated with
- this could be suggested in the embedding step based on columns identified as arrays of numbers
- determine categorical and numeric columns
- this would support filtering
- and allowing to color by field
enjalot commented
Implemented in 0.1.8