The LLM Chatbot for Interacting with Spanish Podcasts is an AI-powered chatbot designed to enhance your podcast listening experience. It is specifically tailored for Spanish-speaking podcast enthusiasts who want to engage with and learn from their favorite Spanish podcasts in a whole new way.
This chatbot utilizes advanced natural language processing techniques to provide real-time text-based interactions with Spanish podcasts. Whether you want to generate transcripts, summarize episodes, or seek information about specific topics within a podcast, this chatbot is here to assist you.
Video Name | Channel | Topic | Status |
---|---|---|---|
Worldcast #45 - Roberto Vaquero | Worldcast | Politics | Available |
Marc Vidal - entrevista a Jose Luis Cava | Marc Vidal | Economy and Politics | Available soon |
Wall Street Wolverine - entrevista a Samuel Vázquez | Wall Street Wolverine | Politics and Society | Available soon |
The Wild Project - entrevista a Operador Nuclear | The Wild Project | Society | Available soon |
- Transcription: Get accurate text transcriptions of podcast episodes.
- Summarization: Receive concise summaries of podcast episodes.
- Search: Find specific information, keywords, or topics within podcasts.
- Language Support: Currently designed only to interact in English, translating the outputs into Spanish (Spanish based LLM models will be available soon...).
Before you begin, ensure you have the following requirements in place:
- Python: You need Python 3.7 or later installed on your system.
- Virtual Environment (Optional): It's recommended to create a virtual environment to manage dependencies.
- GPU: not mandatory, but recommended (GPU used: Nvidia T4)
-
Clone the repository:
git clone https://github.com/AlbertoUAH/Castena.git cd Castena
-
Install the required Python packages:
pip install -r requirements.txt
-
Run app (via Streamlit):
streamlit run app.py --logger.level=warning 2 > 'app_log.log'
-
Launch the chatbot using the instructions in the "Installation" section.
-
The chatbot will prompt you to enter the podcast name or URL you want to interact with.
-
Choose one of the available interaction options, such as transcription, summarization, or search.
-
Follow the chatbot's prompts to provide additional details or requests.
-
Enjoy your enhanced podcast experience!
To evaluate our approach, labelled datasets are needed. To do so, we employ Label Studio library
To evaluate LLM performance, QAEvalChain, cosine_similarity and a sentence similarity model are used
Video Name | Channel | Topic | QAEvalChain (% Corrected Answer) | Mean Cosine Similarity | Median Cosine Similarity | Mean Sentence Similarity | Median Sentence Similarity |
---|---|---|---|---|---|---|---|
Worldcast #45 - Roberto Vaquero | Worldcast | Politics & Opinion | 20 out of 30 corrected answers (~67 %) | 0.7943 | 0.8198 | 0.7225 | 0.7406 |
We welcome contributions to improve the functionality and capabilities of this chatbot. If you'd like to contribute, please follow these guidelines:
-
Fork the repository.
-
Create a new branch for your feature or bug fix:
git checkout -b feature/new-feature
. -
Make your changes and test thoroughly.
-
Commit your changes with a clear and concise message:
git commit -m "Add new feature"
. -
Push your branch to your forked repository:
git push origin feature/new-feature
. -
Create a pull request against the main repository's
main
branch. -
Provide a detailed description of your changes and why they are valuable.
This project is licensed under the MIT License - see the LICENSE file for details.
- Colab notebooks (free tier)
- TogetherAI (LLM support)