This workshop aims to take unprepared data and make it usable with a Retrieval Augementation Generation (RAG) Pattern for a chat bot.
In this workshop, we'll be using Aiven for OpenSearch and LangChain to:
- Chunk transcription data and generate embeddings
- Configure our OpenSearch index for Known Nearest Neighbors (KNN) and perform a similarity search
- Connect our search responses to an Large Language Model (LLM) to generate informed answers using LangChain
- Compare the performance of multiple LLMs
Our instructions and notebooks are in the workshop
folder.
Aiven for Apache Kafka®️ and Python tutorial is licensed under the Apache license, version 2.0. Full license text is available in the LICENSE file.
Please note that the project explicitly does not require a CLA (Contributor License Agreement) from its contributors.
Conduit Podcast Transcripts by Jay Miller, Kathy Campbell, original downloads from whisper work done by Pilix is licensed under Attribution-NonCommercial-ShareAlike 4.0 International
Bug reports and patches are very welcome, please post them as GitHub issues and pull requests at https://github.com/Aiven-Labs/preparing-data-for-opensearch-and-rag
To report any possible vulnerabilities or other serious issues please see our security policy.
Report Code of Conduct issues according to our policy