Add Semantic Search Feature
Opened this issue ยท 4 comments
What would you like to be added:
I propose to add a Semantic Search feature that enhances the ability to search and retrieve documents semantically. This functionality could be beneficial for users looking to improve the relevancy of search results beyond traditional keyword matching. The conceptual architecture and workflow are illustrated in the images included.
Key Decisions Needed:
-
When to save/update documents in the Vector Store?
- Options:
- Every time a document is updated
- Periodically through a Cron Job
- After a set duration without updates (e.g., 10 minutes)
- Initially embed large documents, then embed smaller updates, with periodic consolidation.
- Options:
-
How to store existing data in the Vector Store during feature deployment?
-
Chunking Strategy:
- Different chunking methods have advantages and disadvantages, including:
- Parent-Child Chunking
- Fixed Chunking
- Other strategies
- Different chunking methods have advantages and disadvantages, including:
-
Embedding Model:
- What model should we use for embedding?
- It may be costly to rely on commercial models like OpenAI due to frequent embedding needs.
- Exploring options like Ollama or smaller models could be sufficient.
-
Vector Store Considerations:
- Recommendations for potential Vector Stores:
- Milvus (29k)
- Weviate (10k)
- Chroma (14k)
- Faiss (30k)
- Need for features like Namespace to support separation by Workspace for better data management.
- Recommendations for potential Vector Stores:
Why is this needed:
Integrating a Semantic Search feature will significantly enhance user experience by providing more relevant and efficient search capabilities.
Additional Information:
- Relevant references must be gathered for informed decision-making.
Here is a list of resources that are useful to read when adding this feature.
- How to implement Weaviate RAG applications with Local LLMs and Embedding models
- Ollama Embedding models
- Building a Local RAG System for Privacy Preservation with Ollama and Weaviate
- Weaviate Store - langchain.js
- Weaviate - langchain.js
- Faiss + langchain.js
- Milvus + langchain.js
- weaviate + langchain.js
- Ollama + langchain.js + ChromaDB Demo
- Weaviate vs Milvus
- Milvus vs ChormaDB
This feature can be useful for resolving this issue: yorkie-team/yorkie#1002
@sihyeong671 Could you check this comment(yorkie-team/yorkie#1002 (comment))? Doc event webhook is useful to implement semantic search.