SoundStream for Pytorch

Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint.

16kHz pretrained model was trained on LibriSpeech train-clean-100 with NVIDIA T4 for about 150 epochs (around 50 hours) in total. The model is not causal.

import torchaudio
import torch

model = torch.hub.load("kaiidams/soundstream-pytorch", "soundstream_16khz")
x, sr = torchaudio.load('input.wav')
x, sr = torchaudio.functional.resample(x, sr, 16000), 16000
with torch.no_grad():
    y = model.encode(x)
    # y = y[:, :, :4]  # if you want to reduce code size.
    z = model.decode(y)
torchaudio.save('output.wav', z, sr)

sample audio

Audio references are sampled from LibriSpeech test-clean.

Reference	SoundStream
audio link	audio link
audio link	audio link
audio link	audio link