embeddings4j

Embeddings4j is an in-memory vector database optimized for storage and efficient searching of defaultEmbeddings.

Use Cases (Why would I need that?)

You want to prototype quickly and do not want to depend on remote or commercial vector DBs.
You want to store your data in separate persistent storage and use an in-memory DB as an index for fast lookup.
You want to use it for integration testing.

Current State:

Implemented using HNSW.
Uses cosine distance for similarity search.
Supports float and double types for vectors.
Supports storing ID and contents along with the vector, so you can:
- Store vectors together with contents in-memory (if you want to have the fastest search and your data fits into memory).
- Store vector by ID (if you want to store your data in another location and use in-memory database as a fast index).
- Both (mix as you wish).

Planned Improvements / Features:

More data types for vectors.
Updates/deletes of existing embeddings/vectors.
Dynamic DB size (currently, you have to define max size during DB initialization).
Serialization of DB to disk or other persistent storage and deserializing it back.
Consider using more efficient/compact data types instead of floats/doubles (to save memory and speed up storage/search).
Consider using quantization (to save memory and speed up storage/search).
Consider reducing dimensionality of vectors (to save memory and speed up storage/search).
Please let us know what you need.

Requirements

Java 8 or later.

Start Using

Maven:

<dependency>
  <groupId>dev.ai4j</groupId>
  <artifactId>embeddings4j</artifactId>
  <version>0.2.0</version>
</dependency>

Gradle:

implementation 'dev.ai4j:embeddings4j:0.2.0'

How to Use

// Init DB
int dimensions = 2; // low number for easy understanding
int maxSize = 10000;
InMemoryVectorDatabase db = new InMemoryVectorDatabase(dimensions, maxSize);

// Create embeddings
DefaultEmbedding embedding1 = DefaultEmbedding.of("1", "text 1", asList(3.0f, 5.0f));
DefaultEmbedding embedding2 = DefaultEmbedding.of("2", "text 2", asList(2.0f, 2.0f));
DefaultEmbedding embedding3 = DefaultEmbedding.of("3", "text 3", asList(4.0f, 2.0f));
DefaultEmbedding embedding4 = DefaultEmbedding.of("4", "text 4", asList(3.0f, 1.0f));

// Insert embeddings into DB
db.insert(embedding1, embedding2, embedding3, embedding4);

// Create search query
DefaultEmbedding referenceEmbedding = DefaultEmbedding.of("5", "text 5", asList(1.0f, 3.0f));
int maxResults = 2;
SearchNearestQuery<String, String, Float> query = new SearchNearestQuery<>(referenceEmbedding, maxResults);

// Execute query
List<SearchNearestResult<String, String, Float>> results = db.execute(query);

// Observe results
assertThat(results).hasSize(maxResults);
assertThat(results.get(0).embedding()).isEqualTo(embedding1);
assertThat(results.get(1).embedding()).isEqualTo(embedding2);

ai-for-java/embeddings4j

embeddings4j

Use Cases (Why would I need that?)

Current State:

Planned Improvements / Features:

Requirements

Start Using

How to Use