The aim of this project is to develop an API to create a vector database using ChromaDB with FastAPI.
For testing, I use one of my papers, which can be found at /files
.
There are two functions in functions.py
, that are responsible to create and delete database.
The steps to create the vector database are:
- Load the documents that I want to persist in the vector database
- Create chunks using a text splitter from Langchain
- Load an embedding function to create embeddings from the chunks. In this project, I use a open-souce model "all-MiniLM0L6-v2" which is avaiable at sbert.net.
- Persist documents in ChromaDB
The step to delete the vector database is delete the related folder.
It was defined three methods:
- Create database, using GET.
- Delete database, using DELETE.
- Querying database, using POST.
For POST method, it was defined a simple data validation using pydantic which can be found at models.py
. In this file, a class is created to guarantee data types. For query is str and for number of neighbours is int.