adrianco/meGPT

Process YouTube playlist for ingestion

adrianco opened this issue · 6 comments

Youtube has transcripts but they aren't very good and it's not possible to download them from youtube's API unless you uploaded the video yourself. ChatGPT was used to build some code to do this but the pytube library comes with a command line tool that downloads a whole playlist to a directory. Then Whisper can be used to generate a transcript. Ideally, the author voice would be recognized and labeled in the transcript, for cases where the video is of an interview or there are multiple speakers.

Thanks for the input, https://www.descript.com looks really powerful, I didn't know about it.

I had a very long ChatGPT session where I eventually discovered that you can only access the provided transcript with an authenticated API call for your own videos. I've abandoned this approach but here it is for reference. https://chatgpt.com/share/21e3b3af-bd97-409c-9938-f3f57298383f