@langchain/community module "chromadb" throws if filter for search is not defined
Closed this issue · 2 comments
Checked other resources
- I added a very descriptive title to this issue.
- I searched the LangChain.js documentation with the integrated search.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain.js rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
import { Chroma } from '@langchain/community/vectorstores/chroma'
import { OllamaEmbeddings } from '@langchain/ollama'
const documents = [
{
id: '1',
pageContent: 'The powerhouse of the cell is the mitochondria',
metadata: {}
}
]
const embeddings = new OllamaEmbeddings({
model: 'nomic-embed-text:latest'
})
const vectorStore = new Chroma(embeddings, {
url: 'http://localhost:8000',
collectionName: 'issue',
collectionMetadata: {
'hnsw:space': 'cosine'
}
})
await vectorStore.addDocuments(documents, { ids: ['1'] })
const results = await vectorStore.similaritySearch('biology', 1)
Error Message and Stack Trace (if applicable)
ChromaClientError: Bad request to http://localhost:8000/api/v1/collections/e20e78e4-e278-4e14-aafc-04736871e8b8/query with status: Bad Request
at chromaFetch (file://issue/node_modules/.pnpm/chromadb@1.9.2_openai@4.71.1_zod@3.23.8_/node_modules/chromadb/dist/chromadb.mjs:2604:17)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async Collection.query (file://issue/node_modules/.pnpm/chromadb@1.9.2_openai@4.71.1_zod@3.23.8_/node_modules/chromadb/dist/chromadb.mjs:2295:12)
at async Chroma.similaritySearchVectorWithScore (file://issue/node_modules/.pnpm/@langchain+community@0.3.12_@ibm-cloud+watsonx-ai@1.1.2_@langchain+core@0.3.17_openai@4.71.1__57zubc44l3hvik5mkoauqn25qi/node_modules/@langchain/community/dist/vectorstores/chroma.js:321:24)
at async Chroma.similaritySearch (file://issue/node_modules/.pnpm/@langchain+core@0.3.17_openai@4.71.1_zod@3.23.8_/node_modules/@langchain/core/dist/vectorstores.js:104:25)
at async file://issue/issue.js:24:17 {
cause: undefined
}
Description
-
I am trying to use chroma through langchain to search for vector embeddings.
-
If documents do not contain general metadata to filter for a 400 Bad Request is returned.
-
Only if all documents contain e.g.
{ ..., metadata: { source: '1' } }
results can be found with
await vectorStore.similaritySearch('biology', 1, { source: '1' } )
-
IMO querying should work without any filter being set.
-
I'd propose the following code change:
https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/vectorstores/chroma.ts
// L348 ff
async similaritySearchVectorWithScore(
query: number[],
k: number,
filter?: this["FilterType"]
) {
if (filter && this.filter) {
throw new Error("cannot provide both `filter` and `this.filter`");
}
const _filter = filter ?? this.filter;
// +
const where = _filter === undefined ? undefined : { ..._filter }
const collection = await this.ensureCollection();
// similaritySearchVectorWithScore supports one query vector at a time
// chroma supports multiple query vectors at a time
const result = await collection.query({
queryEmbeddings: query,
nResults: k,
// -where: { ..._filter },
where
});
System Info
platform: mac
npm version
{
npm: '10.9.0',
node: '22.11.0',
package.json
"dependencies": {
"@langchain/community": "^0.3.12",
"@langchain/core": "^0.3.17",
"@langchain/ollama": "^0.1.2",
"chromadb": "^1.9.2",
"langchain": "^0.3.5"
}
Hey @commenthol! I'm here to help you with any bugs, questions, or contributions. Let's dig into this together!
The issue you're encountering seems to be related to how the similaritySearchVectorWithScore
method in the Chroma module handles the filter
parameter. By default, if no filter is provided, the method should perform a similarity search without any filtering criteria, considering all available vectors in the collection [1].
The proposed code change you mentioned aims to address this by setting the where
clause to undefined
if no filter is provided, which aligns with the expected behavior. This change should prevent the 400 Bad Request error when no filter is specified, allowing the search to proceed without filtering.
If you implement this change, it should resolve the issue by ensuring that the where
clause is only applied when a filter is explicitly defined. This approach maintains the flexibility of querying without mandatory filters, which seems to be the intended functionality.
Oh dear, thank you!