subtitle_scraper
The youtube api was being less of an API only allows limited access, so I created this. A module that will give you the captions/subitiles of a youtube video given the id
Install
npm install subtitle_scraper
Example
The module is very simple and hopefully self-explanitory based on this Example (using an Applied Science Video, check it out if you have time)
var subtitle_scraper = require('subtitle_scraper');
// In goes the youtube id, you can find this in the video url (usually "v=...")
subtitle_scraper("l7qUo330J0M", function(err, arr, raw, url){
// err contains any errors straight from the "Request" module
// arr contains the json equivalent of the xml response of the subtitles
// url is the url queryed to get the subtitles
console.log(arr.transcript.text);
// arr.transcript.text will probably be the only useful part of "arr"
});
Youtube will give data like so
<transcript>
<text start="0" dur="4.61">
today on Applied Science I'd like to talk about my adventures in making deco
</text>
<text start="4.61" dur="4.13">
tape if you haven't heard deco tape is sort of an alternative to the currently
</text>
<text start="8.74" dur="4.09">
available adhesive tapes it actually works by a different mechanism and it's
</text>
<text start="12.83" dur="3.76">
not really a commercial product yet but it's got a lot of press Insert of
</text>
<text start="16.59" dur="4.57">
popular science articles and so it has a few attractive qualities that make it
</text>
</transcript>
The module uses xml2js to parse the xml. If you think there is a better option feel free to make modifications. The results from xml2js:
{"transcript":
{"text":[
{"_":"today on Applied Science I'd like to\ntalk about my adventures in making deco","$":{"start":"0","dur":"4.61"}},
{"_":"tape if you haven't heard deco tape is\nsort of an alternative to the currently","$":{"start":"4.61","dur":"4.13"}},
{"_":"available adhesive tapes it actually\nworks by a different mechanism and it's","$":{"start":"8.74","dur":"4.09"}},
{"_":"not really a commercial product yet but\nit's got a lot of press Insert of","$":{"start":"12.83","dur":"3.76"}},
{"_":"popular science articles and so it has a\nfew attractive qualities that make it","$":{"start":"16.59","dur":"4.57"}}]
}
}
Bulk Request
It doesn't save any time, but this function allows you to get subtitles from a list of video ids. The function works just the same as before except with an array instead of a simple string and an options object is required
Here is an example:
var subtitle_scraper = require('subtitle_scraper');
// It is also possible now to provide a list of ids to get the subtitles for each
// a delay can be provide and probably should be
subtitle_scraper(["9XQfYKYO380", "KAm7qAKAXwI", "cwN983PnJoA"], { delay: 2000 }, function(err, arr){
//arr contains an array of the same type of results as with the simple request in their respective order
console.log(arr);
});
"options" can currently only provide a delay option. "delay" will result in a delay between requests
Note:
This module could stop working at any time really. If youtube changed the variable name of "ttsurl" to something like "banana" it would surely be useless.