chore(api): Don't reindex files for different vector stores, unless params differ
Opened this issue · 0 comments
CollectiveUnicorn commented
Describe what should be investigated or refactored
Currently each vector store reindexes all of the files that are attached to them even if they have already been indexed in another vector store(s). This is a very costly and time consuming process. Update the code so that indexing only occurs once, unless the parameters (chunking for instance) is different.
Links to any relevant code
- https://github.com/defenseunicorns/leapfrogai/blob/main/src/leapfrogai_api/routers/openai/vector_stores.py
- https://github.com/defenseunicorns/leapfrogai/blob/main/src/leapfrogai_api/backend/rag/index.py
Additional context
Even with this change this could still lead to some non-trivial overhead. If possible we should eliminate the duplication of vector_content
in the db as well to minimize the overhead to the bare minimum.