/ptbp-transcripts

AI-generated transcripts for the Pretending to Be People podcast, stay greasy Wolf <3

Primary LanguagePython

Pretending To Be People Transcripts

AI-generated transcripts for the Pretending to Be People podcast, stay greasy Wolf 🐺. Made for fans by fans to help with accessibility and the wiki.

Transcript files have been sorted into folders in this repo by the format of the output transcription: json, srt, tsv, vtt, and txt.

Most Recent Episode Transcription: S2E29: Wet, Smooth Holes

Note! This project does not contain the transcripts for the Patreon-only episodes, go support them and generate them for yourselves you filthy animals.

Intellectual Property Notice

All the code in this project was integrated by sn3akiwhizper using examples from other leaders in the areas of using AI for audio forms of data. Besides that, all content is the intellectual property of the Pretending to Be People crew. We make no claims of ownership over the amazing stories that they tell. This project is solely for increasing the accessibility and reach of the podcast so they may continue bringing us entertainment to the holes of our ears. And now the obligitory disclaimer that Delta Green is the intellectual property of Arc Dream Publishing, the PTBP folks have received permission from Arc Dream for their podcast (this project has not contacted or been contacted by Arc Dream or PTBP, if they have a problem they can find me on Twitter or Discord).

File Structure

  • docs: documentation pages describing some of the work happening in this project
  • episode-transcription: the code and output from episode transcription efforts
  • ptbp-fandom/: code and data for generating Fandom wiki pages

Documentation

Future Work

  • complete bulk catchup of transcription, diarization
  • combine transcript/diarization to produce speaker-tagged transcripts (started but broken)
  • parse speaker-tagged transcripts and validate each speaker's name (might require training models to recognize each person's voice)
  • upload scripts to automatically perform the transcription, diarization, and combination
    • some available, working on others
  • AI generated summaries of podcast episodes
  • Transcript book (markdown -> epub, pdf, mobi) complete with AI illustrations