The 8Bit Auto Embeddings repository contains the backend and machine learning embeddings components used in the 8Bit Auto web application. It uses FastAPI for creating a RESTful API and ChromaDB for managing and querying embedded data.
- FastAPI for efficient and easy-to-document API routes.
- ChromaDB integration for persistent storage of embedding vectors.
- Sentence-Transformer models for generating embeddings from text data.
These instructions will guide you through setting up the project locally for development and testing.
- Python 3.8+
- FastAPI
- Uvicorn (ASGI server)
- ChromaDB
- Sentence Transformers
- Clone the repository:
git clone https://github.com/sb2bg/8bit-auto-embeddings.git
- Navigate to the project directory:
cd 8bit-auto-embeddings
- Install the required Python packages:
pip install -r requirements.txt
- Start the Uvicorn server:
The server will run on
uvicorn main:app --reload
http://127.0.0.1:8000
and is accessible via browser or API testing tools like Postman.
To generate embeddings using chroma_embedder.py
, follow these steps:
- Ensure
data.csv
is in the repository root, formatted with any columns, as long as it includesexcerpt
, which is the text data to be embedded. - Run
chroma_embedder.py
:This script uses the SentenceTransformer model to convert titles frompython chroma_embedder.py
data.csv
into embeddings, storing them in a ChromaDB collection namedcars
which is persisted in thechroma.db
folder.
Once the server is running, you can access the API documentation automatically generated by FastAPI at http://127.0.0.1:8000/docs
. This documentation provides interactive endpoints where you can test the API functionalities directly.
However, the following endpoints are available:
- GET
/chat/{car_str}
or/chat/{car_str}?n_results={n_results}
- Description: Retrieve the
n_results
(default 5) nearest embedding vector for a given car string. - Query Parameters:
car_str
(str): The car description for which to retrieve the embeddings.n_results
(int): The number of nearest embeddings to return (default: 5).
- Returns: -
[{current_bid_formatted: string, excerpt: string,thumbnail_url: string, title: string}]
: A list of the nearest car vector matches for the given car string.
- Description: Retrieve the
- Use the API to retrieve, update, embedding records.
- Interact with the ChromaDB to query embedding vectors based on textual inputs.
- Generate embeddings from text data using the SentenceTransformer model.
This project is licensed under the MIT License - see the LICENSE
file for details.