Dripfarm/SVDB

Guidance on limiting returned entries?

RosTeHeA opened this issue · 2 comments

First off: thank you for this!

How I'm using it: I'm indexing notes from my note app to make it easier to find answers.

Question: Right now, the returned notes returns the entire note. With long notes, this sometimes returns a lot of info, and if I send to OpenAI, it hits the token limit. Is there a way to split the note into chunks, and only return the relevant, say, paragraph or two?

Here is my current code:

func searchSVDBDocuments(queryEmbedding: [Double], maxResults: Int = 5) async -> [(String, Double)] {
guard let collection = collection else {
print("SVDB collection is nil, cannot perform search")
return []
}
print("Performing search in SVDB collection")
let results = collection.search(query: queryEmbedding)
print("Search results obtained: (results)")

    // Limit the number of results to maxResults
    let limitedResults = results.prefix(maxResults)
    return limitedResults.map { ($0.text, $0.score) }
}
yych42 commented

SVDB is a very lightweight implementation. You will have to do chunking by yourself before feeding everything into SVDB. One way to do it is to structure a single note as containing a collection of chunks, and feed all the chunks into SVDB, then retrieve the notes associated with each chunk after SVDB returns the results.

Noted. Thank you!