Multilingual Speech Recognition Model for RAG

Introduction

This project aims to build a multilingual speech recognition model for Retrieval-Augmented Generation (RAG) without the need for training. The model leverages OpenAI's Whisper-medium for its robust multilingual speech recognition capabilities and integrates it with Langchain's RetrialQA and the Mistral-7B language model for enhanced performance on tasks like translation and summarization.

Project Overview

The project consists of the following key steps:

Preprocessing:
- Video to Audio Conversion: Utilize the moviepy library to convert video files into audio format.
- Audio Chunking: Segment the audio into smaller chunks to accommodate Whisper's limitations on audio size.
Multilingual Speech Recognition with Whisper:
- Model Selection: Employ OpenAI Whisper medium model for its multilingual speech recognition capabilities.
- Combining Transcriptions: Combine all transcriptions after segmenting the audio into chunks.
RAG Model Integration:
- FAISS Vector Store: Establish a vector store using FAISS for efficient information retrieval from the database.
Leveraging the Final Model:
- Langchain's RetrialQA: Combine retrieval with the Mistral-7B(llm) model for enhanced performance.
Using the Model for Tasks:
- Task Execution: Provide appropriate prompts for tasks like translation and summarization of text.
Evaluating the Model:
- Summarization Evaluation: Utilize rouge metrics for evaluating the summarization tasks.
- Translation Evaluation: Implement sacrebleu metrics for evaluating translation tasks.

Dependencies

Python 3.x
moviepy
OpenAI Whisper
FAISS
Langchain
Mistral-7B

Conclusion

This project demonstrates a comprehensive approach to building a multilingual speech recognition model for RAG without the need for training. By leveraging OpenAI's Whisper-medium, FAISS, Langchain, and the Mistral-7B language model, the model can effectively handle tasks like translation and summarization. The project can be further extended and optimized based on specific requirements and use cases.

Basavachari/Multilingual-Automated-Speech-Recognization

Multilingual Speech Recognition Model for RAG

Introduction

Project Overview

Dependencies

Conclusion