audiowmark - Audio Watermarking

Description

audiowmark is an Open Source (GPL) solution for audio watermarking.

A sound file is read by the software, and a 128-bit message is stored in a watermark in the output sound file. For human listeners, the files typically sound the same.

However, the 128-bit message can be retrieved from the output sound file. Our tests show, that even if the file is converted to mp3 or ogg (with bitrate 128 kbit/s or higher), the watermark usually can be retrieved without problems. The process of retrieving the message does not need the original audio file (blind decoding).

Internally, audiowmark is using the patchwork algorithm to hide the data in the spectrum of the audio file. The signal is split into 1024 sample frames. For each frame, some pseoudo-randomly selected amplitudes of the frequency bands of a 1024-value FFTs are increased or decreased slightly, which can be detected later. The algorithm used here is inspired by

Martin Steinebach: Digitale Wasserzeichen für Audiodaten.
Darmstadt University of Technology 2004, ISBN 3-8322-2507-2

Open Source License

audiowmark is open source software available under the GPLv3 or later license.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Adding a Watermark

To add a watermark to the soundfile in.wav with a 128-bit message (which is specified as hex-string):

  $ audiowmark add in.wav out.wav 0123456789abcdef0011223344556677
  Input:        in.wav
  Output:       out.wav
  Message:      0123456789abcdef0011223344556677
  Strength:     10

  Time:         3:59
  Sample Rate:  48000
  Channels:     2
  Data Blocks:  4

The most important options for adding a watermark are:

--key <filename>: Use watermarking key from file <filename> (see Watermark Key).
--strength <s>: Set the watermarking strength (see Watermark Strength).

Retrieving a Watermark

To get the 128-bit message from the watermarked file, use:

  $ audiowmark get out.wav
  pattern  0:05 0123456789abcdef0011223344556677 1.324 0.059 A
  pattern  0:57 0123456789abcdef0011223344556677 1.413 0.112 B
  pattern  0:57 0123456789abcdef0011223344556677 1.368 0.086 AB
  pattern  1:49 0123456789abcdef0011223344556677 1.302 0.098 A
  pattern  2:40 0123456789abcdef0011223344556677 1.361 0.093 B
  pattern  2:40 0123456789abcdef0011223344556677 1.331 0.096 AB
  pattern   all 0123456789abcdef0011223344556677 1.350 0.054

The output of audiowmark get is designed to be machine readable. Each line that starts with pattern contains one decoded message. The fields are seperated by one or more space characters. The first field is a timestamp indicating the position of the data block. The second field is the decoded message. For most purposes this is all you need to know.

The software was designed under the assumption that you - the user - will be able to decide whether a message is correct or not. To do this, on watermarking song files, you could list each message you embedded in a database. During retrieval, you should look up each pattern audiowmark get outputs in the database. If the message is not found, then you should assume that a decoding error occurred. In our example each pattern was decoded correctly, because the watermark was not damaged at all, but if you for instance use lossy compression (with a low bitrate), it may happen that only some of the decoded patterns are correct. Or none, if the watermark was damaged too much.

The third field is the sync score (higher is better). The synchronization algorithm tries to find valid data blocks in the audio file, that become candidates for decoding.

The fourth field is the decoding error (lower is better). During message decoding, we use convolutional codes for error correction, to make the watermarking more robust.

The fifth field is the block type. There are two types of data blocks, A blocks and B blocks. A single data block can be decoded alone, as it contains a complete message. However, if during watermark detection an A block followed by a B block was found, these two can be decoded together (then this field will be AB), resulting in even higher error correction capacity than one block alone would have.

To improve the error correction capacity even further, the all pattern combines all data blocks that are available. The combined decoded message will often be the most reliable result (meaning that even if all other patterns were incorrect, this could still be right).

The most important options for getting a watermark are:

--key <filename>: Use watermarking key from file <filename> (see Watermark Key).
--strength <s>: Set the watermarking strength (see Watermark Strength).

Watermark Key

Since the software is Open Source, a watermarking key should be used to ensure that the message bits cannot be retrieved by somebody else (which would also allow removing the watermark without loss of quality). The watermark key controls all pseudo-random parameters of the algorithm. This means that it determines which frequency bands are increased or decreased to store a 0 bit or a 1 bit. Without the key, it is impossible to decode the message bits from the audio file alone.

Our watermarking key is a 128-bit AES key. A key can be generated using

audiowmark gen-key test.key

and can be used for the add/get commands as follows:

audiowmark add --key test.key in.wav out.wav 0123456789abcdef0011223344556677
audiowmark get --key test.key out.wav

Watermark Strength

The watermark strength parameter affects how much the watermarking algorithm modifies the input signal. A stronger watermark is more audible, but also more robust against modifications. The default strength is 10. A watermark with that strength is recoverable after mp3/ogg encoding with 128kbit/s or higher. In our informal listening tests, this setting also has a very good subjective quality.

A higher strength (for instance 15) would be helpful for instance if robustness against multiple conversions or conversions to low bit rates (i.e. 64kbit/s) is desired.

A lower strength (for instance 6) makes the watermark less audible, but also less robust. Strengths below 5 are not recommended. To set the strength, the same value has to be passed during both, generation and retrieving the watermark. Fractional strengths (like 7.5) are possible.

audiowmark add --strength 15 in.wav out.wav 0123456789abcdef0011223344556677
audiowmark get --strength 15 out.wav

Short Payload (experimental)

By default, the watermark will store a 128-bit message. In this mode, we recommend using a 128bit hash (or HMAC) as payload. No error checking is performed, the user needs to test patterns that the watermarker decodes to ensure that they really are one of the expected patterns, not a decoding error.

As an alternative, an experimental short payload option is available, for very short payloads (12, 16 or 20 bits). It is enabled using the --short <bits> command line option, for instance for 16 bits:

audiowmark add --short 16 in.wav out.wav abcd
audiowmark get --short 16 out.wav

Internally, a larger set of bits is sent to ensure that decoded short patterns are really valid, so in this mode, error checking is performed after decoding, and only valid patterns are reported.

Besides error checking, the advantage of a short payload is that fewer bits need to be sent, so decoding will more likely to be successful on shorter clips.

Video Files

For video files, videowmark can be used to add a watermark to the audio track of video files. To add a watermark, use

  $ videowmark add in.avi out.avi 0123456789abcdef0011223344556677
  Audio Codec:  -c:a mp3 -ab 128000
  Input:        in.avi
  Output:       out.avi
  Message:      0123456789abcdef0011223344556677
  Strength:     10

  Time:         3:53
  Sample Rate:  44100
  Channels:     2
  Data Blocks:  4

To detect a watermark, use

  $ videowmark get out.avi
  pattern  0:05 0123456789abcdef0011223344556677 1.294 0.142 A
  pattern  0:57 0123456789abcdef0011223344556677 1.191 0.144 B
  pattern  0:57 0123456789abcdef0011223344556677 1.242 0.145 AB
  pattern  1:49 0123456789abcdef0011223344556677 1.215 0.120 A
  pattern  2:40 0123456789abcdef0011223344556677 1.079 0.128 B
  pattern  2:40 0123456789abcdef0011223344556677 1.147 0.126 AB
  pattern   all 0123456789abcdef0011223344556677 1.195 0.104

The key and strength can be set using the command line options

--key <filename>: Use watermarking key from file <filename> (see Watermark Key).
--strength <s>: Set the watermarking strength (see Watermark Strength).

Videos can be watermarked on-the-fly using HTTP Live Streaming.

Output as Stream

Usually, an input file is read, watermarked and an output file is written. This means that it takes some time before the watermarked file can be used.

An alternative is to output the watermarked file as stream to stdout. One use case is sending the watermarked file to a user via network while the watermarker is still working on the rest of the file. Here is an example how to watermark a wav file to stdout:

audiowmark add in.wav - 0123456789abcdef0011223344556677 | play -

In this case the file in.wav is read, watermarked, and the output is sent to stdout. The "play -" can start playing the watermarked stream while the rest of the file is being watermarked.

If - is used as output, the output is a valid .wav file, so the programs running after audiowmark will be able to determine sample rate, number of channels, bit depth, encoding and so on from the wav header.

Note that all input formats supported by audiowmark can be used in this way, for instance flac/mp3:

audiowmark add in.flac - 0123456789abcdef0011223344556677 | play -
audiowmark add in.mp3 - 0123456789abcdef0011223344556677 | play -

Input from Stream

Similar to the output, the audiowmark input can be a stream. In this case, the input must be a valid .wav file. The watermarker will be able to start watermarking the input stream before all data is available. An example would be:

cat in.wav | audiowmark add - out.wav 0123456789abcdef0011223344556677

It is possible to do both, input from stream and output as stream.

cat in.wav | audiowmark add - - 0123456789abcdef0011223344556677 | play -

Streaming input is also supported for watermark detection.

cat in.wav | audiowmark get -

Raw Streams

So far, all streams described here are essentially wav streams, which means that the wav header allows audiowmark to determine sample rate, number of channels, bit depth, encoding and so forth from the stream itself, and the a wav header is written for the program after audiowmark, so that this can figure out the parameters of the stream.

There are two cases where this is problematic. The first case is if the full length of the stream is not known at the time processing starts. Then a wav header cannot be used, as the wav file contains the length of the stream. The second case is that the program before or after audiowmark doesn’t support wav headers.

For these two cases, raw streams are available. The idea is to set all information that is needed like sample rate, number of channels,… manually. Then, headerless data can be processed from stdin and/or sent to stdout.

--input-format raw
--output-format raw
--format raw: These can be used to set the input format or output format to raw. The last version sets both, input and output format to raw.
--raw-rate <rate>: This should be used to set the sample rate. The input sample rate and the output sample rate will always be the same (no resampling is done by the watermarker). There is no default for the sampling rate, so this parameter must always be specified for raw streams.
--raw-input-bits <bits>
--raw-output-bits <bits>
--raw-bits <bits>: The options can be used to set the input number of bits, the output number of bits or both. The number of bits can either be 16 or 24. The default number of bits is 16.
--raw-input-endian <endian>
--raw-output-endian <endian>
--raw-endian <endian>: These options can be used to set the input/output endianness or both. The <endian> parameter can either be little or big. The default endianness is little.
--raw-input-encoding <encoding>
--raw-output-encoding <encoding>
--raw-encoding <encoding>: These options can be used to set the input/output encoding or both. The <encoding> parameter can either be signed or unsigned. The default encoding is signed.
--raw-channels <channels>: This can be used to set the number of channels. Note that the number of input channels and the number of output channels must always be the same. The watermarker has been designed and tested for stereo files, so the number of channels should really be 2. This is also the default.

HTTP Live Streaming

Introduction for HLS

HTTP Live Streaming (HLS) is a protocol to deliver audio or video streams via HTTP. One example for using HLS in practice would be: a user watches a video in a web browser with a player like hls.js. The user is free to play/pause/seek the video as he wants. audiowmark can watermark the audio content while it is being transmitted to the user.

HLS splits the contents of each stream into small segments. For the watermarker this means that if the user seeks to a position far ahead in the stream, the server needs to start sending segments from where the new play position is, but everything in between can be ignored.

Another important property of HLS is that it allows separate segments for the video and audio stream of a video. Since we watermark only the audio track of a video, the video segments can be sent as they are (and different users can get the same video segments). What is watermarked are the audio segments only, so here instead of sending the original audio segments to the user, the audio segments are watermarked individually for each user, and then transmitted.

Everything necessary to watermark HLS audio segments is available within audiowmark. The server side support which is necessary to send the right watermarked segment to the right user is not included.

HLS Requirements

HLS support requires some headers/libraries from ffmpeg:

libavcodec
libavformat
libavutil
libswresample

To enable these as dependencies and build audiowmark with HLS support, use the --with-ffmpeg configure option:

$ ./configure --with-ffmpeg

In addition to the libraries, audiowmark also uses the two command line programs from ffmpeg, so they need to be installed:

ffmpeg
ffprobe

Preparing HLS segments

The first step for preparing content for streaming with HLS would be splitting a video into segments. For this documentation, we use a very simple example using ffmpeg. No matter what the original codec was, at this point we force transcoding to AAC with our target bit rate, because during delivery the stream will be in AAC format.

$ ffmpeg -i video.mp4 -f hls -master_pl_name replay.m3u8 -c:a aac -ab 192k \
  -var_stream_map "a:0,agroup:aud v:0,agroup:aud" \
  -hls_playlist_type vod -hls_list_size 0 -hls_time 10 vs%v/out.m3u8

This splits the video.mp4 file into an audio stream of segments in the vs0 directory and a video stream of segments in the vs1 directory. Each segment is approximately 10 seconds long, and a master playlist is written to replay.m3u8.

Now we can add the relevant audio context to each audio ts segment. This is necessary so that when the segment is watermarked in order to be transmitted to the user, audiowmark will have enough context available before and after the segment to create a watermark which sounds correct over segment boundaries.

$ audiowmark hls-prepare vs0 vs0prep out.m3u8 video.mp4
AAC Bitrate:  195641 (detected)
Segments:     18
Time:         2:53

This steps reads the audio playlist vs0/out.m3u8 and writes all segments contained in this audio playlist to a new directory vs0prep which contains the audio segments prepared for watermarking.

The last argument in this command line is video.mp4 again. All audio that is watermarked is taken from this audio master. It could also be supplied in wav format. This makes a difference if you use lossy compression as target format (for instance AAC), but your original video has an audio stream with higher quality (i.e. lossless).

Watermarking HLS segments

So with all preparations made, what would the server have to do to send a watermarked version of the 6th audio segment vs0prep/out5.ts?

$ audiowmark hls-add vs0prep/out5.ts send5.ts 0123456789abcdef0011223344556677
Message:      0123456789abcdef0011223344556677
Strength:     10

Time:         0:15
Sample Rate:  44100
Channels:     2
Data Blocks:  0
AAC Bitrate:  195641

So instead of sending out5.ts (which has no watermark) to the user, we would send send5.ts, which is watermarked.

In a real-world use case, it is likely that the server would supply the input segment on stdin and send the output segment as written to stdout, like this

$ [...] | audiowmark hls-add - - 0123456789abcdef0011223344556677 | [...]
[...]

The usual parameters are supported in audiowmark hls-add, like

--key <filename>: Use watermarking key from file <filename> (see Watermark Key).
--strength <s>: Set the watermarking strength (see Watermark Strength).

The AAC bitrate for the output segment can be set using:

--bit-rate <bit_rate>: Set the AAC bit-rate for the generated watermarked segment.

The rules for the AAC bit-rate of the newly encoded watermarked segment are:

if the --bit-rate option is used during hls-add, this bit-rate will be used
otherwise, if the --bit-rate option is used during hls-prepare, this bit-rate will be used
otherwise, the bit-rate of the input material is detected during hls-prepare

Dependencies

If you compile from source, audiowmark needs the following libraries:

libfftw3
libsndfile
libgcrypt
libzita-resampler
libmpg123

If you want to build with HTTP Live Streaming support, see also HLS Requirements.

Building fftw

audiowmark needs the single prevision variant of fftw3.

If you are building fftw3 from source, use the --enable-float configure parameter to build it, e.g.::

cd ${FFTW3_SOURCE}
./configure --enable-float --enable-sse && \
make && \
sudo make install

or, when building from git

cd ${FFTW3_GIT}
./bootstrap.sh --enable-shared --enable-sse --enable-float && \
make && \
sudo make install

Docker Build

You should be able to execute audiowmark via Docker. Example that outputs the usage message:

docker build -t audiowmark .
docker run -v <local-data-directory>:/data -it audiowmark -h

DayBreakZhang/audiowmark