FirebasePodcastTranscription: A Python repository from jaredfiacco2

Store Podcast Transcriptions in GCP Firebase

This code takes an XML RSS-feed of my favorite podcast, "DarkNet Diaries", translates he data into a Python Dataframe, stores itlocally as a ".pkl" file, stores it in Google Cloud Platform's "Firebase" NoSQL database. Then, python loops through the firebase stream, locally transcribing the podcasts and storing the transcription in the firebase. The ".pkl" files are used for data mining and.

Table of Contents

About The Project
- Built With
Prerequisites/Instructions
Contact

About The Project

My favorite podcast is more than 100 episodes and counting, all ~1hr each. After binging it in a month, I found myself wanting to search for episodes to rewatch or solidify interesting facts. This project enables cheap storage in the cloud, transcript searchability, and statistical research and NLP projects in the future.
I take an RSS XML feed, loop through podcasts mp3 links, transcribing them, and storing the results locally as .pkl files and in the cloud in a firebase.

Built With

Prerequisites

Installing all Required Packages

pip install -r requirements.txt

Open a Google Cloud Platform Account and a firebase account.
Download a admin sdk json file to access firebase. Download the file and replace the firebase-adminsdk.json file in your repo. Adjust "cred" variable in loadToFirebase_gitVersion.py file to match the name of your credentials file.

Check access to RSS Feed.

Run loadToFirebase.py in python. This step took my computer well over 24 hours for the 100+ hours in the DarkNet Diaries Podcast.

python loadToFirebase_gitVersion.py

Check Firebase to make sure the data went through.

Use Jupyter Notebook and pandas to play wit the pickle data!

Contact

Jared Fiacco - jaredfiacco2@gmail.com

Another GCP Project of Mine: Publish Computer Statistics to Pub/Sub, Use Cloud Functions to Store in BigQuery

jaredfiacco2/FirebasePodcastTranscription

Store Podcast Transcriptions in GCP Firebase

About The Project

Built With

Prerequisites

Contact