This Streamlit application allows users to chat with an AI assistant that has knowledge from multiple websites. The app uses LangChain and OpenAI's GPT models to process website content and generate responses based on user queries.
- Process multiple websites simultaneously
- Chat interface for querying processed website content
- Sample links for quick testing
- Comprehensive error handling and user feedback
-
Clone this repository:
git clone https://github.com/yourusername/multi-website-chatbot.git cd multi-website-chatbot
-
Install the required packages:
pip install streamlit langchain langchain-openai chromadb
-
Set up your OpenAI API key:
- Create a
.env
file in the project root - Add your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
- Create a
-
Run the Streamlit app:
streamlit run app.py
-
Open your web browser and go to the URL provided by Streamlit (usually
http://localhost:8501
) -
In the sidebar:
- Enter website URLs (one per line) in the text area, or
- Click "Try with sample links" to use pre-defined sample URLs
-
Click "Process Websites" to load and process the websites
-
Once processing is complete, you can start chatting with the AI about the content of the processed websites
-
Website Processing:
- The app uses LangChain's WebBaseLoader to fetch content from the provided URLs
- The content is split into smaller chunks using RecursiveCharacterTextSplitter
- These chunks are then embedded and stored in a Chroma vector store
-
Query Processing:
- When a user asks a question, the app uses a retriever to find relevant chunks from the vector store
- These chunks provide context for the language model to generate a response
-
Response Generation:
- The app uses OpenAI's GPT-3.5-turbo model to generate responses
- The model is given the context from the retrieved chunks and the chat history
- It then generates a response based on this information
The following flow chart illustrates the main steps of the Multi-Website Chatbot process:
graph TD
A[Start] --> B[Enter Website URLs]
B --> C{Process Websites?}
C -->|Yes| D[Fetch Content with WebBaseLoader]
C -->|No| B
D --> E[Split Content into Chunks]
E --> F[Create Vector Store]
F --> G[Wait for User Query]
G --> H[Retrieve Relevant Chunks]
H --> I[Generate Response with LLM]
I --> J[Display Response to User]
J --> K{Continue Chatting?}
K -->|Yes| G
K -->|No| L[End]
This flow chart provides a high-level overview of how the Multi-Website Chatbot processes information and interacts with the user.
get_vectorstore_from_urls(urls)
: Processes multiple websites and creates a vector storeget_context_retriever_chain(vector_store)
: Creates a retriever chain for finding relevant contextget_conversational_rag_chain(retriever_chain)
: Sets up the conversational retrieval-augmented generation chainget_response(user_input)
: Generates a response to the user's input
- The app processes only the main content of each URL, not navigating through internal links
- Very large websites may take significant time to process
- The quality of responses depends on the relevance and quality of the processed website content
- Implement caching to speed up repeated processing of the same websites
- Add support for processing PDFs and other document types
- Implement user authentication for personalized experiences
- Enhance the UI with more interactive elements and visualizations of the processed data
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.