/youtube-transcript-extractor

A Python script for extracting and punctuating transcripts from YouTube videos.

Primary LanguagePython

YouTube-Transcript-Extractor

This project includes a straightforward and functional Python script that automates the process of fetching transcripts from YouTube videos. The script takes a YouTube URL as an input, extracts the video transcript, punctuates it using an external API, and saves the result as a text file. It uses the youtube_transcript_api package for transcript retrieval and the Bark Punctuator API for punctuating the transcripts.

The extracted transcript is then processed further to make it more readable and editable. For this, the Natural Language Toolkit NLTK is employed. The resulting formatted and punctuated transcript is finally saved as a text file, providing a ready-to-use document for further uses like analysis, translation, etc.

ss

Installation

Use the package manager pip to install:

requests
youtube_transcript_api
nltk

Also, make sure to download NLTK's tokenizers before you can use them. This can be done by running the following Python command:

nltk.download('punkt')