/soundstream-pytorch

Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint

Primary LanguagePythonMIT LicenseMIT

SoundStream for Pytorch

Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint.

16kHz pretrained model was trained on LibriSpeech train-clean-100 with NVIDIA T4 for about 150 epochs (around 50 hours) in total. The model is not causal.

import torchaudio
import torch

model = torch.hub.load("kaiidams/soundstream-pytorch", "soundstream_16khz")
x, sr = torchaudio.load('input.wav')
x, sr = torchaudio.functional.resample(x, sr, 16000), 16000
with torch.no_grad():
    y = model.encode(x)
    # y = y[:, :, :4]  # if you want to reduce code size.
    z = model.decode(y)
torchaudio.save('output.wav', z, sr)

sample audio

Audio references are sampled from LibriSpeech test-clean.

Reference SoundStream
audio link audio link
audio link audio link
audio link audio link