  1. Clone the repository.
  2. Navigate to your repository directory: ‘cd your-repository’.
  3. Create a virtual environment: 'pipenv shell'.
  4. Install the required packages: 'pipenv install'.
  5. Set up environment variables: Create a .env file in the root directory of your project and add your Pinecone API key, OpenAI API key
  6. Fetch data from the MongoDB website: mkdir mongodb-docs wget -r -P mongodb-docs -E
  7. Pre-process the data by running the script. You should see the following message if successful: Going to add xxx to Pinecone Loading to vectorstore done
  8. Start the app: streamlit run

Explanation of pre-processing data

Pass data to vector database (Pinecone) using

The command wget -r -P mongodb-docs -E retrieves documents from MongoDB's documentation website, processes them, and stores them in a Pinecone Vector Store for efficient retrieval and embedding using OpenAI's embedding model.


  • Loads documents from MongoDB documentation.
  • Splits documents into smaller chunks for efficient processing.
  • Updates document metadata with the correct source URLs.
  • Adds processed documents to a Pinecone Vector Store.


Explanation of RAG (Retrieval-Augmented Generation) Script (

This Python script implements a Retrieval-Augmented Generation (RAG) model using LangChain, OpenAI, and Pinecone. The script retrieves relevant documents based on a query, incorporates chat history, and generates responses using OpenAI's language models.


  • Embeds documents using OpenAI's embedding model.
  • Retrieves documents from Pinecone Vector Store.
  • Rephrases queries and performs retrieval-based question answering.
  • Combines retrieved documents to generate a response.


  • Python 3.x
  • python-dotenv
  • langchain
  • langchain-openai
  • langchain-pinecone
  • Pinecone account and API key
  • OpenAI API key



This function:

  1. Initializes OpenAI embeddings and Pinecone Vector Store.
  2. Sets up a chat model with OpenAI's language model.
  3. Pulls prompts for rephrasing queries and retrieval-based question answering.
  4. Creates a history-aware retriever and a retrieval chain.
  5. Invokes the retrieval chain with the input query and chat history.
  6. Returns the generated result.