/Opensearch-Langchain-OpenAI-RAG-Pattern-Python

A Retrieval Augmented Generation pattern using Aiven for Opensearch®, Langchain, and Python

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Implementing a Retrieval Augmented Generation (RAG) pattern with OpenAI, OpenSearch, and LangChain

Quickstart this workshop by using our pre-configured codespace!

Open in GitHub Codespaces

This notebook demonstrates how to use Opensearch, OpenAI and LangChain.

Why using OpenSearch as backend vector database

OpenSearch is a widely adopted open source search/analytics engine. It allows to store, query and transform documents in a variety of shapes and provides fast and scalable functionalities to perform both accurate and fuzzy text search. Using OpenSearch as vector database enables you to mix and match semantic and text search queries on top of a performant and scalable engine.

Getting Started

This repo uses a jupyter notebook to walk through the process of creating an OpenSearch datastore in Aiven and searching against it using LangChain.

Prequisites

To complete this lab you will need to ensure the following are completed.

Setup your Aiven Account

You will need an Aiven account. You can sign up for an account.

Creating an Account with Aiven

Create an OpenSearch service

You can create an OpenSearch service in the Aiven console by selecting the OpenSearch service. You can choose the cloud provider and region you want to deploy the service in.

You can also create the service using the Aiven CLI.

Creating an OpenSearch Service

Add OpenAI Credits and Create an OpenAI API key

Our semantic search will be powered in-part by OpenAI API. In order to use the API, you will need to create an API key and purchase credits.

  • Visit https://platform.openai.com and sign in or create an account
  • On the left sidebar, select Settings, followed by Billing
  • Select Add to credit balance (You will need to add a payment method)

Adding OpenAI Credits

Next, you will need to create an API key that will be used to authenticate your requests to the OpenAI API.

  • In the sidebar, select API keys and then Create new secret key

Create a new secret key

  • Give your key a name and select All for permissions. Select Create secret key
  • Copy the key and store it in a safe place.

Warning

You will need it to authenticate your requests to the OpenAI API.

Copy your API key

[Option 1] Create a new codespace in GitHub

You can use the code button in the top right of the repo or the badge at the top of the readme to create a new codespace in GitHub. This will create a new environment with all the required dependencies to run the notebook.

You can select the image below to create a new codespace.

Open in GitHub Codespaces

[Option 2] Setup the project locally

  • fork the repo to your own GitHub account

  • clone the repo to your local machine

  • ensure Python is installed. You can download it from python.org or use a package manager like Homebrew on macOS, the Window Store on Windows, or your package manager of choice on Linux.

    python --version
  • create a virtual environment based on your operating system

  • install the dependencies

    You can install the python dependencies using pip.

    python -m pip install -r requirements.txt

Follow the notebook

Let's begin with downloading our data and ingesting it into Opensearch.

Go to 1-ingesting-data.ipynb in the workshops foldedr or click the button below to begin the workshop.

Button to Open Jupyter Notebook