Vectorizer is a simple service that creates embeddings from your text data, stores vectors in memory along with their text associations and performs similarity search.
Text data sentences.txt
:
The cat sat on the mat.
A feline is resting on the rug.
Kittens love to play with yarn.
Dogs often chase their tails.
I enjoy reading books by the beach.
Reading near the ocean is my favorite activity.
She sold seashells by the seashore.
He bought some shells from the beach vendor.
Science is the poetry of reality.
Physics explains the fundamental laws of the universe.
She baked a chocolate cake for her birthday.
I prefer chocolate cookies over vanilla.
The football game ended with a surprising twist.
Soccer matches can be very unpredictable.
The mountain peak was covered with snow.
Hikers often face challenges in snowy terrains.
He loves listening to classical music.
Jazz and blues are her favorite genres.
Learning languages opens up new worlds.
She speaks Spanish, French, and Italian fluently.
Code example:
var textParser = new FileTextParser("sentences.txt");
var text = textParser.parse();
var embeddingsProvider = new OpenAIEmbeddingsProvider(System.getenv("OPEN_API_KEY"), "text-embedding-ada-002");
var vectorizer = Vectorizer.from(text, embeddingsProvider);
var input = "I love playing guitar";
var similarText = vectorizer.similaritySearch(input);
similarText.forEach(System.out::println);
Program output:
He loves listening to classical music.
Jazz and blues are her favorite genres.
I enjoy reading books by the beach.
TextParser
for parsing text from various sources
EmbeddingsProvider
to introduce a new model that generates embeddings from textual content
SimilarityCalculator
for implement a new similarity calculation algorithm
VectorStorage
to provide your own storage mechanism for vectors