langchain-ai/langchainjs

@langchain/community module "chromadb" throws if filter for search is not defined

Closed this issue · 2 comments

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import { Chroma } from '@langchain/community/vectorstores/chroma'
import { OllamaEmbeddings } from '@langchain/ollama'

const documents = [
  {
    id: '1',
    pageContent: 'The powerhouse of the cell is the mitochondria',
    metadata: {}
  }
]

const embeddings = new OllamaEmbeddings({
  model: 'nomic-embed-text:latest'
})
const vectorStore = new Chroma(embeddings, {
  url: 'http://localhost:8000',
  collectionName: 'issue',
  collectionMetadata: {
    'hnsw:space': 'cosine'
  }
})
await vectorStore.addDocuments(documents, { ids: ['1'] })

const results = await vectorStore.similaritySearch('biology', 1)

Error Message and Stack Trace (if applicable)

ChromaClientError: Bad request to http://localhost:8000/api/v1/collections/e20e78e4-e278-4e14-aafc-04736871e8b8/query with status: Bad Request
    at chromaFetch (file://issue/node_modules/.pnpm/chromadb@1.9.2_openai@4.71.1_zod@3.23.8_/node_modules/chromadb/dist/chromadb.mjs:2604:17)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Collection.query (file://issue/node_modules/.pnpm/chromadb@1.9.2_openai@4.71.1_zod@3.23.8_/node_modules/chromadb/dist/chromadb.mjs:2295:12)
    at async Chroma.similaritySearchVectorWithScore (file://issue/node_modules/.pnpm/@langchain+community@0.3.12_@ibm-cloud+watsonx-ai@1.1.2_@langchain+core@0.3.17_openai@4.71.1__57zubc44l3hvik5mkoauqn25qi/node_modules/@langchain/community/dist/vectorstores/chroma.js:321:24)
    at async Chroma.similaritySearch (file://issue/node_modules/.pnpm/@langchain+core@0.3.17_openai@4.71.1_zod@3.23.8_/node_modules/@langchain/core/dist/vectorstores.js:104:25)
    at async file://issue/issue.js:24:17 {
  cause: undefined
}

Description

  • I am trying to use chroma through langchain to search for vector embeddings.

  • If documents do not contain general metadata to filter for a 400 Bad Request is returned.

  • Only if all documents contain e.g. { ..., metadata: { source: '1' } } results can be found with
    await vectorStore.similaritySearch('biology', 1, { source: '1' } )

  • IMO querying should work without any filter being set.

  • I'd propose the following code change:
    https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/vectorstores/chroma.ts

// L348 ff
async similaritySearchVectorWithScore(
    query: number[],
    k: number,
    filter?: this["FilterType"]
  ) {
    if (filter && this.filter) {
      throw new Error("cannot provide both `filter` and `this.filter`");
    }
    const _filter = filter ?? this.filter;
    // +
    const where = _filter === undefined ? undefined : { ..._filter }

    const collection = await this.ensureCollection();

    // similaritySearchVectorWithScore supports one query vector at a time
    // chroma supports multiple query vectors at a time
    const result = await collection.query({
      queryEmbeddings: query,
      nResults: k,
      // -where: { ..._filter },
      where
    });
   

System Info

platform: mac

npm version
{
  npm: '10.9.0',
  node: '22.11.0',

package.json

  "dependencies": {
    "@langchain/community": "^0.3.12",
    "@langchain/core": "^0.3.17",
    "@langchain/ollama": "^0.1.2",
    "chromadb": "^1.9.2",
    "langchain": "^0.3.5"
}

Hey @commenthol! I'm here to help you with any bugs, questions, or contributions. Let's dig into this together!

The issue you're encountering seems to be related to how the similaritySearchVectorWithScore method in the Chroma module handles the filter parameter. By default, if no filter is provided, the method should perform a similarity search without any filtering criteria, considering all available vectors in the collection [1].

The proposed code change you mentioned aims to address this by setting the where clause to undefined if no filter is provided, which aligns with the expected behavior. This change should prevent the 400 Bad Request error when no filter is specified, allowing the search to proceed without filtering.

If you implement this change, it should resolve the issue by ensuring that the where clause is only applied when a filter is explicitly defined. This approach maintains the flexibility of querying without mandatory filters, which seems to be the intended functionality.

To continue talking to Dosu, mention @dosu.

Oh dear, thank you!