This is a Node.js API which allows you to retrieve the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!
npm install youtube-transcript-node
The easiest way to get a transcript for a given video is to execute:
import { YouTubeTranscriptApi } from 'youtube-transcript-node';
const api = new YouTubeTranscriptApi();
const transcript = await api.fetch(videoId);Note: By default, this will try to access the English transcript of the video. If your video has a different language, or you are interested in fetching a transcript in a different language, please read the section below.
Note: Pass in the video ID, NOT the video URL. For a video with the URL
https://www.youtube.com/watch?v=12345the ID is12345.
This will return a FetchedTranscript object with a snippets array containing objects like:
{
text: "Hey there",
start: 0.0,
duration: 1.54
}You can add a list of preferred languages, which will be used as a fallback if the first one is not available.
const transcript = await api.fetch(videoId, ['de', 'en']);To get a list of all available transcripts for a video:
const transcriptList = await api.list(videoId);const transcriptList = await api.list(videoId);
// Get manually created transcripts
const transcript = transcriptList.findManuallyCreatedTranscript(['de', 'en']);
// Get automatically generated transcripts
const transcript = transcriptList.findGeneratedTranscript(['de', 'en']);
// Get any transcript (manual first, then generated)
const transcript = transcriptList.findTranscript(['de', 'en']);const transcriptList = await api.list(videoId);
const transcript = transcriptList.findTranscript(['en']);
const translatedTranscript = transcript.translate('de');
const fetchedTranslated = await translatedTranscript.fetch();By default, HTML tags are stripped from the transcript. To preserve formatting:
const transcript = await api.fetch(videoId, ['en'], true);You can use different formatters to format the transcript output:
import { JSONFormatter, TextFormatter, WebVTTFormatter } from 'youtube-transcript-node/formatters';
const formatter = new TextFormatter();
const formattedText = formatter.formatTranscript(transcript);Available formatters:
JSONFormatter- Formats as JSONTextFormatter- Formats as plain textPrettyPrintFormatter- Formats as pretty-printed JSONWebVTTFormatter- Formats as WebVTT subtitles
import {
TranscriptsDisabled,
NoTranscriptFound,
VideoUnavailable,
InvalidVideoId
} from 'youtube-transcript-node';
try {
const transcript = await api.fetch(videoId);
} catch (error) {
if (error instanceof TranscriptsDisabled) {
// Subtitles are disabled for this video
} else if (error instanceof NoTranscriptFound) {
// No transcript found in requested languages
} else if (error instanceof VideoUnavailable) {
// Video is unavailable
} else if (error instanceof InvalidVideoId) {
// Invalid video ID provided
}
}MIT