Add New VectorDB Supabase (pgvector) Support for RAG in Vocode Open Source
arpagon opened this issue · 3 comments
Issue: Add New VectorDB Supabase (pgvector) Support for RAG in Vocode Open Source
Description
This issue proposes the integration of Supabase (pgvector) as a new Vector Database (VectorDB) option for the Retrieval-Augmented Generation (RAG) feature in the Vocode Open Source project. The addition of Supabase (pgvector) aims to enhance the capabilities of RAG by leveraging its efficient vector search functionality, especially beneficial for handling large datasets and complex queries in conversational AI applications.
Objectives
- Integration of Supabase (pgvector) with Vocode RAG: Establish a seamless connection between Vocode's RAG feature and Supabase (pgvector) to enable efficient vector storage and retrieval.
- Optimization for Conversational AI: Ensure that the integration is optimized for conversational AI use cases, focusing on query performance, scalability, and accuracy.
- Documentation and Examples: Provide comprehensive documentation and practical examples to guide users in utilizing Supabase (pgvector) with Vocode RAG.
Motivation
https://supabase.com/blog/pgvector-vs-pinecone
- Enhanced Performance and Scalability: Supabase (pgvector) offers advanced vector search capabilities, which can significantly improve the performance and scalability of RAG in Vocode.
- Broader Database Compatibility: Adding Supabase (pgvector) support will cater to a wider user base that prefers or requires this specific database technology in their projects.
- Innovation in Conversational AI: By integrating with Supabase (pgvector), Vocode can push the boundaries of conversational AI, enabling more complex and nuanced dialogues in AI applications.
Implementation Considerations
- Compatibility: Ensure that the integration is compatible with the existing architecture of Vocode RAG.
- Performance Metrics: Establish benchmarks to evaluate the performance impact of using Supabase (pgvector) in various scenarios.
Call for Contributions
We encourage contributions from the community to help with the implementation, testing, and documentation of this feature. Whether you're an expert in databases, conversational AI, or a keen open-source contributor, your input is highly valued.
Conclusion
The addition of Supabase (pgvector) as a new VectorDB option in Vocode Open Source is expected to significantly enhance the RAG feature, providing users with more flexibility and performance benefits. We look forward to collaborating with the community on this exciting development.
I am working on this issue and according to my research.
There are two ways for building this.
- Using the https://github.com/supabase/vecs client for making supabase connection.
- or Using sqlalchemy the way langchain has done langchain-ai/langchain/libs/community/langchain_community/vectorstores/pgvector.py
I am planning to go ahead with approach number 1 as it will be easier and faster to implement.
What do you think?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi @arpagon I have some questions on the Vector DB implementation and maybe it's relevant for this. Note that these questions are not related to PGVector specifically, but the Vocode's vector db implementation as a whole
-
What is the purpose of including an
add_text
method? My understanding is the vector DB should already have been built when it's connected to Vocode. That is, Vocode responsibility is simply the retrieval -- adding text should, therefore, not be part of the interface? https://github.com/vocodedev/vocode-python/blob/cfd2eb44308cfe28136b409e22706bd5465b6c46/vocode/streaming/vector_db/pinecone.py#L27 -
Moreover, Vocode's current implementation of
similarity_search_with_score
is essentially what langchain already has implemented. So my question is, should Vocode just use langchain directly rather than making it's own implementation?
https://github.com/vocodedev/vocode-python/blob/cfd2eb44308cfe28136b409e22706bd5465b6c46/vocode/streaming/vector_db/pinecone.py#L71