/bed

Text embeddings via OpenAI - assorted Python examples

Primary LanguageJupyter NotebookBSD Zero Clause License0BSD

Text embeddings via OpenAI - assorted Python examples

These are some examples of embeddings using OpenAI models.

Examples are here because I find each to be of some interest, but this is not intended as a tutorial for how to use embeddings.

For instructive examples, see the official OpenAI repository openai-cookbook.

This repository, bed, is like bedj, but this is in Python (with Jupyter notebooks) and is more extensive.

License

0BSD. See LICENSE.

Contents

Summary forthcoming. For now, look at the descriptions at the top of each notebook.

Notes

The examples are written to assume your API key is in a file called .api_key. Do not commit it to Git! The .gitignore file excludes it, to help avoid that.

Matrix multiplication

One interesting technique shown here is storing the embeddings as rows of a matrix, then finding similarities with matrix multiplication.

The second operand can be an embedding, in which case we are multiplying a matrix by a column vector, which is the same as taking the dot products of all of the matrix's rows with the vector (to make the new coordinates of the resulting vector).

If the second operand is a matrix whose columns are embeddings, then each (i, j) entry of the resulting matrix is the dot product of the ith row of the first matrix by the jth row of the second matrix, i.e., the similarities of those embeddings.

The dot products are the cosine similarities with OpenAI embeddings, and with embeddings from some other non-OpenAI models (but not all), because many embedding models, including all OpenAI models, produce embeddings that are already normalized (length 1).