ChromaDB is an open-source vector database that allows you to store and query vector embeddings. This package provides a PHP client for the ChromaDB API.
You can install the package via composer:
composer require helgesverre/chromadb
You can publish the config file with:
php artisan vendor:publish --tag="chromadb-config"
This is the contents of the published config/chromadb.php
file:
return [
'token' => env('CHROMADB_TOKEN'),
'host' => env('CHROMADB_HOST', 'localhost'),
'port' => env('CHROMADB_PORT', '19530'),
];
$chromadb = new \HelgeSverre\Chromadb\Chromadb(
token: 'test-token-chroma-local-dev',
host: 'http://localhost',
port: '8000'
);
// Create a new collection with optional metadata
$chromadb->collections()->create(
name: 'my_collection',
);
// Count the number of collections
$chromadb->collections()->count();
// Retrieve a specific collection by name
$chromadb->collections()->get(
collectionName: 'my_collection'
);
// Delete a collection by name
$chromadb->collections()->delete(
collectionName: 'my_collection'
);
// Update a collection's name and/or metadata
$chromadb->collections()->update(
collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
newName: 'new_collection_name',
);
// Add items to a collection with optional embeddings, metadata, and documents
$chromadb->items()->add(
collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
ids: ['item1', 'item2'],
embeddings: ['embedding1', 'embedding2'],
documents: ['doc1', 'doc2']
);
// Update items in a collection with new embeddings, metadata, and documents
$chromadb->items()->update(
collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
ids: ['item1', 'item2'],
embeddings: ['new_embedding1', 'new_embedding2'],
documents: ['new_doc1', 'new_doc2']
);
// Upsert items in a collection (insert if not exist, update if exist)
$chromadb->items()->upsert(
collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
ids: ['item'],
metadatas: [['title' => 'metadata']],
documents: ['document']
);
// Retrieve specific items from a collection by their IDs
$chromadb->items()->get(
collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
ids: ['item1', 'item2']
);
// Delete specific items from a collection by their IDs
$chromadb->items()->delete(
collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
ids: ['item1', 'item2']
);
// Count the number of items in a collection
$chromadb->items()->count(
collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3'
);
// Query items in a collection based on embeddings, texts, and other filters
$chromadb->items()->query(
collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
queryEmbeddings: [createTestVector(0.8)],
include: ['documents', 'metadatas', 'distances'],
nResults: 5
);
This example demonstrates how to perform a semantic search in ChromaDB using embeddings generated from OpenAI.
Full code available in SemanticSearchTest.php.
First, create an array of data you wish to index. In this example, we'll use blog posts with titles, summaries, and tags.
$blogPosts = [
[
'title' => 'Exploring Laravel',
'summary' => 'A deep dive into Laravel frameworks...',
'tags' => ['PHP', 'Laravel', 'Web Development']
],
[
'title' => 'Introduction to React',
'summary' => 'Understanding the basics of React and how it revolutionizes frontend development.',
'tags' => ['JavaScript', 'React', 'Frontend']
],
];
Use OpenAI's embeddings API to convert the summaries of your blog posts into vector embeddings.
$summaries = array_column($blogPosts, 'summary');
$embeddingsResponse = OpenAI::client('sk-your-openai-api-key')
->embeddings()
->create([
'model' => 'text-embedding-ada-002',
'input' => $summaries,
]);
foreach ($embeddingsResponse->embeddings as $embedding) {
$blogPosts[$embedding->index]['vector'] = $embedding->embedding;
}
Create a collection in ChromaDB to store your blog post embeddings.
$createCollectionResponse = $chromadb->collections()->create(
name: 'blog_posts',
);
$collectionId = $createCollectionResponse->json('id');
Insert these embeddings, along with other blog post data, into your ChromaDB collection.
foreach ($blogPosts as $post) {
$chromadb->items()->add(
collectionId: $collectionId,
ids: [$post['title']],
embeddings: [$post['embedding']],
metadatas: [$post]
);
}
Generate a search vector for your query, akin to how you processed the blog posts.
$searchEmbedding = getOpenAIEmbedding('laravel framework');
Use the ChromaDB client to perform a search with the generated embedding.
$searchResponse = $chromadb->items()->query(
collectionId: $collectionId,
queryEmbeddings: [$searchEmbedding],
nResults: 3,
include: ['metadatas']
);
// Output the search results
foreach ($searchResponse->json('results') as $result) {
echo "Title: " . $result['metadatas']['title'] . "\n";
echo "Summary: " . $result['metadatas']['summary'] . "\n";
echo "Tags: " . implode(', ', $result['metadatas']['tags']) . "\n\n";
}
To quickly get started with ChromaDB, you can run it in Docker
# Download the docker-compose.yml file
wget https://github.com/HelgeSverre/chromadb/blob/main/docker-compose.yml
# Start ChromaDB
docker compose up -d
The auth token is set to test-token-chroma-local-dev
by default.
You can change this in the docker-compose.yml
file by changing the CHROMA_SERVER_AUTH_CREDENTIALS
environment
variable
To stop ChromaDB, run docker compose down
, to wipe all the data, run docker compose down -v
.
NOTE
The
docker-compose.yml
file in this repo is provided only as an example and should not be used in production.Go to the ChromaDB deployment documentation for more information on deploying Chroma in production.
cp .env.example .env
docker compose up -d
composer test
composer analyse src
The MIT License (MIT). Please see License File for more information.