qax is a small proof-of-concept implementation for Retrieval Question & Answer (Q&A) aimed to store embeddings for code repositories or any folder full of text files. This code is a simple adaption of the langchain examples, embedded in docker containers
With qax, you can create embeddings for folder structures containing text files, such as code repositories, leverage OpenAI's text-embedding-ada-002 model and store them in a pgVector container.
After the index has been built, the app uses langchain´s RetrievelQA chain with gpt-3.5-turbo
(or gpt-4
if available) model for performing Q&A on the indexed data.
(see langchain QA Docs)
Build a lokal docker container to run the app:
docker build -t qax .
To get started, clone any repository and navigate to its root directory:
git clone https://github.com/your_username/example.git && cd example
Create two .env
files in the root directory of the repository with the following contents:
.env
:
OPENAI_API_KEY=your_openai_api_key_here
# Database Connection
PGVECTOR_HOST=IP_OF_HOST
PGVECTOR_PORT=5432
PGVECTOR_COLLECTION=qax
db.env
:
POSTGRES_USER=victor
POSTGRES_PASSWORD=vector
POSTGRES_DB=vectordb
PGDATA=/.vectordb
docker run -d \
--name pgvector \
--env-file=db.env \
-p 5432:5432 \
-v $PWD/.vectordb:/var/lib/postgresql \
ankane/pgvector
docker run -d `
--name pgvector `
--env-file=db.env `
-p 5432:5432 `
-v ${PWD}/.vectordb:/var/lib/postgresql `
ankane/pgvector
docker cp ./load-ext.sh pgvector:/load-ext.sh && \
docker exec pgvector /load-ext.sh
Run the following command to create embeddings for the files in the repository:
docker run --rm -it \
--env-file=.env \
--env-file=db.env \
-v ${PWD}:/repository \
qax --index
This will, by default, create a .vectordb
folder in your repository to store the index.
To change the name, adjust PGDATA
variable in db.env
To ask questions about the indexed data, use the command below:
docker run --rm -it \
--env-file=.env \
--env-file=db.env \
-v ${PWD}:/repository \
qax [QUERY]
Example 1:
docker run --rm -it \
--env-file=.env \
--env-file=db.env \
-v ${PWD}:/repository \
qax "Which libraries are needed to build the app?"
> Entering new RetrievalQA chain...
> Finished chain.
The libraries needed to build the app are langchain, openai, tiktoken, pathspec, python-dotenv, psycopg2-binary, and pgvector.
--------------------------------------------------------------------------------
Sources:
- /repository/requirements.txt
- /repository/Dockerfile
- /repository/load-ext.sh
- /repository/README.md
Example 2:
docker run --rm -it \
--env-file=.env \
--env-file=db.env \
-v ${PWD}:/repository \
qax "Create inline documentation for the main function"
> Entering new RetrievalQA chain...
> Finished chain.
"""The main function of the document indexing and similarity search program.
Args:
index (bool): Whether to perform document indexing.
"""
embeddings = OpenAIEmbeddings()
--------------------------------------------------------------------------------
Sources:
- /repository/app.py
- /repository/requirements.txt
- /repository/README.md
- /repository/app.py
This project is licensed under the MIT License.
We welcome contributions from the community! Please follow our contribution guidelines for more details.
For any inquiries or support, feel free to join our community on Discord.