A minimal server for generating a ranked list of targets, for a query, based on its k-nearest semantic neighbors. Written in Go.
KNN-router can be used within a larger system to route natural language queries to the right system, with minimal latency. Given a user query, KNN-router:
- Looks up the most semantically similar neighbors from a curated corpus of example utterances
- Computes a weighted average score (based on the distances between query and top-K example utterances) for each target associated with the respective utterances. Targets can be information retrieval systems, agents, LORA adapters, Small/Large Language Models, or others. The sky is the limit!
- Returns a ranked list of targets that are most suitable for satisfying the query
At Pulze.ai, we use KNN-router to select the best LLM for user requests. Try it out locally here.
Works with:
- Embeddings: HuggingFace Text Embeddings Inference
- Vector Store: Qdrant
- Database: Bolt
See this example for getting started locally.
Dependencies:
points.jsonl
: JSONL-formatted file containing points and their respective categories and embeddings. Each line should contain the following fields:point_uid
,category
, andembedding
.targets.jsonl
: JSONL-formatted file containing the targets and their respective scores for each point. Each line should contain the following fields:point_uid
,target
, andscore
.
The following artifacts are required for deployment:
embeddings.snapshot
: Snapshot of Qdrant collection containing the point embeddingsscores.db
: Bolt DB containing the targets and their respective scores for each point
Use this script to generate these artifacts:
scripts/gen-artifacts.sh --points-data-path points.jsonl --scores-data-path targets.jsonl --output-dir ./dist
- Helm chart
- GRPC endpoint