defenseunicorns/leapfrogai

chore(api): Don't reindex files for different vector stores, unless params differ

Opened this issue · 0 comments

Describe what should be investigated or refactored

Currently each vector store reindexes all of the files that are attached to them even if they have already been indexed in another vector store(s). This is a very costly and time consuming process. Update the code so that indexing only occurs once, unless the parameters (chunking for instance) is different.

Links to any relevant code

Additional context

Even with this change this could still lead to some non-trivial overhead. If possible we should eliminate the duplication of vector_content in the db as well to minimize the overhead to the bare minimum.