Work In Progress

This repo is a primer in reading audio (via ffmpeg) into NumPy/PyTorch arrays without copying data or process launching. Interfacing with FFmpeg is done in pure C code in decode_audio.c. Python wrapper is implemented in decode_audio.py using a standard library module ctypes. C code returns a plain C structure Audio. This structure is then interpeted and wrapped by NumPy or PyTorch without copy.

At the bottom is an example of alternative solution using process launching. The first solution is preferable if you must load huge amounts of audio in various formats (for reading *.wav files, there exists a standard Python wave module and scipy.io.wavfile.read).

It is also a simple primer on FFmpeg audio decoding loop and basic ctypes usage for interfacing C code and NumPy/PyTorch (without creating a full-blown PyTorch C++ extension).

Usage

# install dependencies: ffmpeg executables and shared libraries on ubuntu
apt-get install -y ffmpeg libavcodec-dev libavformat-dev libavfilter-dev

# create sample audio test.wav
ffmpeg -f lavfi -i "sine=frequency=1000:duration=5" -c:a pcm_s16le -ar 8000 test.wav

# convert audio to raw format
ffmpeg -i test.wav -f s16le -acodec pcm_s16le golden.raw

# play a raw file
ffplay -f s16le -ac 1 -ar 8000 golden.raw

# compile executable for testing
make decode_audio_ffmpeg

# convert audio to raw format and compare to golden
./decode_audio_ffmpeg test.wav bin.raw
diff golden.raw bin.raw

# compile a shared library for interfacing with NumPy and PyTorch
make decode_audio_ffmpeg.so

# convert audio to raw format (NumPy) and compare to golden
python3 decode_audio.py -i test.wav -o numpy.raw
diff golden.raw numpy.raw

# convert audio to raw format (PyTorch) and compare to golden
python3 decode_audio.py -i test.wav -o torch.raw
diff golden.raw torch.raw

# convert audio to raw format (PyTorch / DLPack) and compare to golden
python3 decode_audio.py -i test.wav -o dlpack.raw
diff golden.raw dlpack.raw

# read audio using subprocess
# python3 decode_audio_subprocess.py test.wav

import sys
import subprocess
import struct

format_ffmpeg, format_struct = [('s16le', 'h'), ('f32le', 'f'), ('u8', 'B'), ('s8', 'b')][0]
sample_rate = 8_000 # resample
num_channels = 1 # force mono

audio = memoryview(subprocess.check_output(['ffmpeg', '-nostdin', '-hide_banner', '-nostats', '-loglevel', 'quiet', '-i', sys.argv[1], '-f', format_ffmpeg, '-ar', str(sample_rate), '-ac', str(num_channels), '-']))
audio = audio.cast(format_struct, shape = [len(audio) // num_channels // struct.calcsize(format_struct), num_channels])

print('shape', audio.shape, 'itemsize', audio.itemsize, 'format', audio.format)
# shape (40000, 1) itemsize 2 format h

TODO

SOX backend ( https://github.com/pytorch/audio/blob/master/torchaudio/torch_sox.cpp)
ffmpeg audio filter graph
decode from a buffer
non-allocating version that keeps allocations in Python for simpler memory management
probe function

vadimkantorov/readaudio

Work In Progress

Usage

TODO