/hls-transcription-sample

Sample demonstrating how to transcribe/live-subtitle HTTP Live Streaming media.

Primary LanguageC#MIT LicenseMIT

HTTP Live Stream (HLS) Transcriber Sample

This sample demonstrates how to transcribe/live-subtitle HTTP Live Streaming (HLS) media. The project is developed for .NET Framework 4.7.1, but should be compatible with versions >= 4.6. The audio extraction and format detection is implemented using NAudio library. The transciption is done with Bing Speech API.

Getting started

  1. Open the HLSSample.sln solution file with Visual Studio
  2. Select the start-up project (HSLConsoleTest.NETFramework or HSLWPFTest.NETFramework): Right-click the project name in Solution Explorer and select Set as StartUp Project
  3. Insert the playlist URL (.m3u8 file) as the value of PlaylistUrl string constant:
  4. Insert the Bing Speech API subscription key to enable transcription (can be omitted but so will then be the transcription)
  5. Run and enjoy

Implementation

The solution consists of three projects:

  1. HSTools.NETFramework,
  2. HSLConsoleTest.NETFramework and
  3. HSLWPFTest.NETFramework.

HSLTools.NETFramework is a class library project containing the main functionality whilst the two other (demo apps) serve as examples on how to use the aforementioned library.

HSLTools

The HSLTools class library implements the following features:

  • Loading and parsing HTTP Live Stream playlists (.m3u8 files)
  • Downloading .ts files into memory
  • Extracting audio from .ts files utilizing NAudio library
  • Saving binary files on local disk
  • Transcribing audio (bytes) with Bing Speech API

The main class of the library is HLSProcessor. See the two demo application projects (HLSConsoleTest and HLSWPFTest) to learn how to use the class library.

HLSConsoleTest and HLSWPFTest projects

HLSConsoleTest processes the given playlist and displays the audio transcription in the console window. HLSWPFTest plays and displays the video files in the playlist with subtitles, which are produced by transcribing the audio in the media.

What's missing?

This is not a production-ready piece of code, but rather a proof-of-concept. Stuff missing/to consider:

  • Universal Windows applications are not supported - please support/contribute to the awesome NAudio project in order to enable UWP compatibility.
  • The media chunks are processed as they come (via the playlist). Thus, if/when the audio in the chunk is terminated in the middle of a word, the transcription is incomplete
    • "TODO" item here: Refactor/break the chunks in pieces based on the silent bits in the audio
  • The MediaElement in WPF applications does not support HTTP Live Stream out-of-the-box - the quick and dirty approach taken here is to save the .ts files onto local disk and to feed the MediaElement the local URIs.

Credits

This project was one of the outcomes of a short hackfest and was developed by the following fantastic team of people (in alphabetical order):