This project includes a straightforward and functional Python script that automates the process of fetching transcripts from YouTube videos. The script takes a YouTube URL as an input, extracts the video transcript, punctuates it using an external API, and saves the result as a text file. It uses the youtube_transcript_api package for transcript retrieval and the Bark Punctuator API for punctuating the transcripts.
The extracted transcript is then processed further to make it more readable and editable. For this, the Natural Language Toolkit NLTK is employed. The resulting formatted and punctuated transcript is finally saved as a text file, providing a ready-to-use document for further uses like analysis, translation, etc.
Use the package manager pip to install:
requests
youtube_transcript_api
nltk
Also, make sure to download NLTK's tokenizers before you can use them. This can be done by running the following Python command:
nltk.download('punkt')