Thie project contains Nodejs scripts which generate captions/subtitles for videos. It is done as a winning submission for Hack&Roll 2018 but it is by far from production ready. The only working file is captioner.js which only allows generating captions for videos under 1 minute. This is a limitation set by Google Speech API Synchronous Speech Recognition.
You will need to create a billing account on Google Cloud Platform in order to use the Speech API but accessing the API will be free up to a point. Check on https://cloud.google.com/speech/pricing
Go to https://cloud.google.com/speech/docs/quickstart and click on the "Set up a project" button. A JSON file containing the service account private key will be downloaded.
Set the environment variable for GOOGLE_APPLICATION_CREDENTIALS
as the path string to the JSON file (https://cloud.google.com/docs/authentication/getting-started#setting_the_environment_variable).
Install ffmpeg which is a dependency to node-fluent-ffmpeg. Make sure you have PATH variable set for the install directory.
npm install
node captioner.js fox.mp4
Currently, the only completely working file is captioner.js
.
You can run it by
node captioner.js input.mp4 output.mp4
-
input.mp4
is be the file name of your source video you want to have it captioned (which should be in the project directory), -
output.mp4
is optionally the file name of the output video. -
If
output.mp4
is omitted, the output file name defaults toinput-captioned.mp4
whereinput
is the source video file name.
Keep in mind that the Speech API needs very clear speech audio. As of the duration of the hackathon, I estimated about 70% accuracy on speech recognition with poor recording environment.
Here I will briefly explain what the code does.
- It uses ffmpeg to extract out the audio from the video file and convert to single channel (limitation of Speech API) and save it to
sample.flac
. - Then it generates a
sample.srt
file (which is a subtitle file format) by sending the flac file over to the Speech API for it to detect words. The API will then return a list of words along with the corresponding timestamp of each word. It then chunks up words which are within 3 seconds apart and save it as a phrase into the srt file. - It again uses ffmpeg to combine the source video and the subtitle into a single file.
Note that the output video will have different codec from the source video. I still have yet to completely understand how ffmpeg works but hey its from a hackathon :p
- Finally the code deletes the
sample.flac
file but I keep thesample.srt
file. This way you can manually modify the srt file to correct wrongly identified words and use the ffmpeg command line to combine the srt file with a video file likeffmpeg -i infile.mp4 -i infile.srt -c copy -c:s mov_text outfile.mp4
source.
- Google Speech API - Speech Recognition
- node-fluent-ffmpeg - Fluent interface for FFMPEG
- FFMPEG - Media manipulation
I will be honest, this project is littered with bugs and bad practices but hey its from a hackathon :p So if you can understand my code and want to contribute, feel free to do so!
- David Choo - Quick 24 hour hack
See also the list of non existing contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details
- The amazing Hack&Roll 2018 for giving me an opportunity to learn something new over the weekend.