gabrielchua/RAGxplorer

Feedback and Suggestions to Improve this Project

gabrielchua opened this issue · 7 comments

First and foremost, I want to express my heartfelt thanks to all of you for showing interest in this project. It's incredibly humbling and exciting to see others taking notice of something I built.

As this is my first time writing code that's being used by others, I am keenly aware that there's a lot I can learn and many ways in which the project can be improved. That's where I need your help!

I'm looking for suggestions on how best to carry this project forward and organize the code more effectively. If you have any ideas, best practices, or tips, please don't hesitate to share. Your insights will be invaluable in making this a better and more user-friendly project.

I also ask for your patience and understanding regarding the current state of the code. I'm aware that it may not be up to the professional standards yet, and I'm fully committed to learning and improving. Any constructive feedback or advice in this regard would be greatly appreciated.

Please feel free to post your suggestions, feedback, or any questions you might have as responses to this issue. I'm looking forward to reading your input and engaging in discussions that can lead to the betterment of this project.

Just my take on possible tasks:

  1. Clean up app.py -> re-factor out the streamlit components into a separate module. One module per section of the UI?
  2. Distribution - can this be a pypi package?
  3. Integrating Langchain to support other vector databases, llms and embedding models?
  4. Allowing users to customize their prompts?

Just my take on possible tasks:

  1. Clean up app.py -> re-factor out the streamlit components into a separate module. One module per section of the UI?

I have not examined the code yet, but refactoring for modularity is almost always a good idea. Keep in mind that Streamlit state management can be tricky when using multiple files.

  1. Distribution - can this be a pypi package?

I think it generally makes sense to bundle the core functionality into a pip package, and then bundle that + the Streamlit app in a Docker image. I can help with that stuff if you'd like.

  1. Integrating Langchain to support other vector databases, llms and embedding models?

Yes, please! Maybe start with more embedding models.

  1. Allowing users to customize their prompts?

Always a good idea. :)

Great work!

Experimenting with a new api in the experiment branch.

git clone -b experiment https://github.com/gabrielchua/RAGxplorer.git
cd RAGxplorer
virtualenv venv # create a new virtual env
source venv/bin/activate # activate the virtual env
pip install -r requirements.txt
from ragxplorer.ragxplorer import Explorer

# Please ensure "OPENAI_API_KEY" is set as an env variable
client = Explorer(embedding_model="text-embedding-ada-002") 

# Or you can use all-MiniLM-L6-v2
# client = Explorer(embedding_model="all-MiniLM-L6-v2") 

client.load_document("presentation.pdf")
client.visualise_query("What are the top revenue drivers for Microsoft?")

The new structure should quite easily support any Huggingface embedding model too

Hi @gabrielchua

This project is really fun and I wanted to try it in Google Colab now that streamlit is down. But on only importing like this:

from ragxplorer import RAGxplorer

I get this error:

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

Hi @mohmah9

Ah yes - do give me a few more days to get new streamlit version up again. I had to take it down since I was making major changes to make RAGxplorer a package.

An OpenAI key is needed for most of the RAGxplorer features - e.g. use of the embedding models.

But good point that the lack of a key should not prevent the use of it/import. Let me fix that.

If you already have an OpenAI key, you can set that in Google Colab.