/RAGxplorer

Visualise and explore your RAG documents

Primary LanguagePythonMIT LicenseMIT

RAGxplorer 🦙🦺

RAGxplorer is an interactive streamlit tool to support the building of Retrieval Augmented Generation (RAG) applications by visualizing document chunks and the queries in the embedding space.

Note

I will be re-factoring the code massively to be a standalone package, instead of being within a streamlit application. Until then, I appreciate your patience. Further suggestions will be most appreciated here.

Demo 🔎

Streamlit App

⚠️ Due to infra limitations, this freely hosted demo may occassionaly go down. The best experience is to clone this repo, and run it locally.

Features ✨

  • Document Upload: Users can upload PDF documents.
  • Chunk Configuration: Options to configure the chunk size and overlap
  • Choice of embedding model: all-MiniLM-L6-v2 or text-embedding-ada-002
  • Vector Database Creation: Builds a vector database using Chroma
  • Query Expansion: Generates sub-questions and hypothetical answers to enhance the retrieval process.
  • Interactive Visualization: Utilizes Plotly to visualise the chunks.

Local Installation ⚙️

To run RAGxplorer, ensure you have Python installed, and then install the necessary dependencies:

pip install -r requirements-local-deployment.txt

Tip

⚠️ Do not use requirements.txt. That is so the free streamlit deployment can run. That file includes an additional pysqlite3-binary dependency.

⚠️ If it helps with troubleshooting, this application was built using Python 3.11

Usage 🏎️

  1. Setup OPENAI_API_KEY (required) and ANYSCALE_API_KEY (if you need anyscale). Copy the .streamlit/secrets.example.toml file to .streamlit/secrets.toml and fill in the values.
  2. To start the application, run:
    streamlit run app.py
  3. You may need to comment out/remove line 5-7 in app.py.
    __import__('pysqlite3')
    import sys
    sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

Note

This repo is currently linked to the streamlit demo, and these lines were added due to the runtime in the free streamlit deployment env. See here.

Contributing 👋

Contributions to RAGxplorer are welcome. Please read our contributing guidelines (WIP) for details.

License 👀

This project is licensed under the MIT license - see the LICENSE file for details.

Acknowledgments 💙

  • DeepLearning.AI and Chroma for the inspiration and code labs in their Advanced Retrival course.
  • The Streamlit community for the support and resources.