Question: how do we respond to changes in the vector database that the frontend is unaware of?

Question

Question: how do we respond to changes in the vector database that the frontend is unaware of?

Closed this issue 6 months ago · 7 comments

If a user manually deletes an embedding or the entire database (e.g. by removing the local Docker volume), does the manuscript list stored in the frontend get updated?

Answer 1 · 2024-02-26T13:50:39.000Z

Yes, every time rag page is opened or 'reconnect' button is clicked, it will update documents.

Answer 2 · 2024-02-26T14:38:46.000Z

Thanks! Does that mean that

if I delete a document ("offline") that was added through the web page, it will disappear on the web page? And
if I add a document ("offline"), the web page will include that additional document?

I can't imagine the second option works, because the offline edit would have to know the connection args from the browser, or do I misunderstand the process?

In other words, is it possible to maintain a large document database (for an entire group or project, for instance), and then automatically use it just by connecting to the correct IP from the RAG settings?

Answer 3 · 2024-03-05T15:22:07.000Z

@slobentanzer I don't get your point. The current rag process of adding document is like this:
1). users upload a document,
2). biochatter embeds the document and saves it to vector database
3). return doc id to frontend
4). frontend adds doc id to its doc workspace, and update document list with its doc workspace

Removing document:
1). frontend raises delete request to biochatter with doc id
2). biochatter removes document from vector database
3). return status code to frontend
4). frontend remove doc id from its doc workspace, and update its document list with the doc workspace

If this doesn't answer your question or you have further concerns, please let me know.

Answer 4 · 2024-03-05T15:39:16.000Z

@fengsh27 what you describe is fully frontend-based, right? I am talking about a database that is maintained via other means (maybe Milvus CLI, Python, etc). Since the user has the ability to connect another database in the settings by entering its IP address, that is possible, right?

My question is now, how does BioChatter Next handle this case? Will it see the documents that already exist in a Milvus DB which is connected to the frontend by entering an IP?

This is for use cases where a researcher or group would like to maintain a consistent library of embedded documents for a specific purpose, which can remain active for as long as necessary without depending on re-embedding the documents. Does this only work via the Next frontend, or can the Milvus DB be created and maintained by other means?

Answer 5 · 2024-03-05T16:27:28.000Z

I got your point. My original thought was, for our vector database ("local"), users can only view the documents uploaded from frontend. In contrast, for users' own database (connected by IP), users would have the capability to view all documents in the database. Or further, we can provide a option for user to view all documents within their own database.

What's your idea?

Answer 6 · 2024-03-05T16:29:21.000Z

My original thought was, for our vector database ("local"), users can only view the documents uploaded from frontend.

Yes, and that is still a good starting point of showcasing the use. My point was just to start thinking about the other use cases. :)

Answer 7 · 2024-03-05T16:30:42.000Z

This issue was just for me to better understand how you have designed the current version, to be able to find the best way forward with these other use cases. We can close the issue and open a new one once we decide if and how to tackle that.