Add an example application about how to properly deal with stale documents on the vector database
eolivelli opened this issue · 1 comments
eolivelli commented
All the example applications that we currently have don't show how to deal with these two common issues:
Shorter pages
When you re-index a website then new version of the page may be shorter, so with less chunks.
You can override the chunks with lower ids, but you keep the old chunks with higher ids.
We need to show how to remove stale chunks
Pages that disappeared
This is trickier. When you know that you are re-indexing the whole corpus of documents (for instance a whole website) you should drop the documents that are no more available, the risks are to have outdated documents or to have duplicate content (in case of a page that has been renamed)
eolivelli commented
The first part has been delivered in the 0.3.0 release