/Retrieval-Augmented-Generation-Intro-Project

This project aims to introduce and demonstrate the practical applications of RAG using Python code in a Jupyter Notebook environment.

Primary LanguageJupyter Notebook

Retrieval Augmented Generation (RAG) Intro Project 🤖🔍📝

Welcome to the Retrieval Augmented Generation (RAG) project! 🎉 This project aims to introduce and demonstrate the practical applications of RAG using Python code in a Jupyter Notebook environment. The LlamaIndex is utilized here. We believe learning and experimenting with RAG should be both educational and fun! 😄

Project Structure 📂

The project is organized into several folders:

  • files: This folder contains important files for your reference:

    • readme.md: You're currently reading this file! It provides an overview of the project.
    • Intro of Retrieval Augmented Generation (RAG) and application demos_Henry.pdf: This file explains the background information and provides additional information of hands-on experiments using RAG. It's a must-read to get started!
  • python_env: In this folder, you'll find the NLP.yml file. Import this YAML file to create a dedicated Python environment for running the code seamlessly.

  • code: The code folder contains three Jupyter Notebook files, each representing a different experiment:

    • 1_Basic_RAG_Pipeline.ipynb: This notebook demonstrates the basic RAG pipeline. It's a great starting point for understanding the fundamentals of RAG.
    • 2_Sentence_window_retrieval.ipynb: This notebook explores the concept of sentence window retrieval using RAG. Discover how RAG can retrieve information from a specific context window.
    • 3_Auto-merging_Retrieval.ipynb: In this notebook, you'll learn about auto-merging retrieval and how it improves the generation process. Exciting stuff!
  • data: The data folder is where you can store your own documents of interest for retrieval. For now, we have included an example file named Henry.txt. Feel free to replace it with your own documents to experiment with RAG.

  • common: Inside this folder, you'll find the openAI.env file. Don't forget to add your OpenAI API key in this file to enable seamless interaction with the OpenAI models.

Getting Started 🚀

To begin your RAG journey, follow these steps:

  1. Clone or download this project repository to your local machine.

  2. Import the NLP.yml file in the python_env folder to create a dedicated Python environment. This ensures all dependencies are properly installed.

  3. In the common folder, open the openAI.env file and enter your OpenAI API key. This step is crucial for accessing OpenAI models.

  4. Familiarize yourself with the project by reading the Intro of Retrieval Augmented Generation (RAG) and application demos_Henry.pdf file in the files folder. It provides important background information and hands-on experiments.

  5. Explore the three Jupyter Notebook files (1_Basic_RAG_Pipeline.ipynb, 2_Sentence_window_retrieval.ipynb, and 3_Auto-merging_Retrieval.ipynb) in the code folder. Run the notebooks to see RAG in action!

  6. Experiment with RAG by modifying the provided examples or using your own documents in the data folder. Feel free to get creative and have fun with it! 🎊

Conclusion 🎓

Congratulations! You are now equipped with the necessary information and tools to learn, apply, and have fun with Retrieval Augmented Generation (RAG). We hope this project sparks your curiosity and encourages you to explore the exciting world of RAG using the provided Jupyter Notebooks. Happy generating! 🤖💡

If you have any questions or need further assistance, please don't hesitate to reach out. Enjoy your RAG journey! 😊