RAGxplorer 🦙🦺

RAGxplorer is an interactive streamlit tool to support the building of Retrieval Augmented Generation (RAG) applications by visualizing document chunks and the queries in the embedding space.

Note

I will be re-factoring the code massively to be a standalone package, instead of being within a streamlit application. Until then, I appreciate your patience. Further suggestions will be most appreciated here.

Demo 🔎

⚠️ Due to infra limitations, this freely hosted demo may occassionaly go down. The best experience is to clone this repo, and run it locally.

Features ✨

Document Upload: Users can upload PDF documents.
Chunk Configuration: Options to configure the chunk size and overlap
Choice of embedding model: all-MiniLM-L6-v2 or text-embedding-ada-002
Vector Database Creation: Builds a vector database using Chroma
Query Expansion: Generates sub-questions and hypothetical answers to enhance the retrieval process.
Interactive Visualization: Utilizes Plotly to visualise the chunks.

Local Installation ⚙️

To run RAGxplorer, ensure you have Python installed, and then install the necessary dependencies:

pip install -r requirements-local-deployment.txt

Tip

⚠️ Do not use requirements.txt. That is so the free streamlit deployment can run. That file includes an additional pysqlite3-binary dependency.

⚠️ If it helps with troubleshooting, this application was built using Python 3.11

Usage 🏎️

Setup OPENAI_API_KEY (required) and ANYSCALE_API_KEY (if you need anyscale). Copy the .streamlit/secrets.example.toml file to .streamlit/secrets.toml and fill in the values.
To start the application, run:
```
streamlit run app.py
```

You may need to comment out/remove line 5-7 in app.py.

__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

Note

This repo is currently linked to the streamlit demo, and these lines were added due to the runtime in the free streamlit deployment env. See here.

Contributing 👋

Contributions to RAGxplorer are welcome. Please read our contributing guidelines (WIP) for details.

License 👀

This project is licensed under the MIT license - see the LICENSE file for details.

Acknowledgments 💙

DeepLearning.AI and Chroma for the inspiration and code labs in their Advanced Retrival course.
The Streamlit community for the support and resources.

cobusgreyling/RAGxplorer