/pytorch_sound

Sound Related Deep Learning Tasks boosting repository with pytorch

Primary LanguagePythonBSD 2-Clause "Simplified" LicenseBSD-2-Clause

Pytorch Sound

Hits Python 3.6


Introduction

Pytorch Sound is a modeling toolkit that allows engineers to train custom models for sound related tasks. It focuses on removing repetitive patterns that builds deep learning pipelines to boost speed of related experiments.

import torch.nn as nn
from pytorch_sound.models import register_model, register_model_architecture


@register_model('my_model')
class Model(nn.Module):
...


@register_model_architecture('my_model', 'my_model_base')
def my_model_base():
    return {'hidden_dim': 256}
from pytorch_sound.models import build_model


# build model
model_name = 'my_model_base'
model = build_model(model_name)
  • Several dataset sources (preprocess, meta, general sound dataset)

LibriTTS, Maestro, VCTK and VoiceBank are prepared at now.

Freely suggest me a dataset or PR is welcome!

  • Abstract Training Process
    • Build forward function (from data to loss, meta)
    • Provide various logging type
      • Tensorboard, Console
      • scalar, plot, image, audio
import torch
from pytorch_sound.trainer import Trainer, LogType


class MyTrainer(Trainer):

    def forward(self, input: torch.tensor, target: torch.tensor, is_logging: bool):
        # forward model
        out = self.model(input)

        # calc your own loss
        loss = calc_loss(out, target)

        # build meta for logging
        meta = {
            'loss': (loss.item(), LogType.SCALAR),
            'out': (out[0], LogType.PLOT)
        }
        return loss, meta

Usage

Install

  • ffmpeg v4
$ sudo add-apt-repository ppa:jonathonf/ffmpeg-4
$ sudo apt update
$ sudo apt install ffmpeg
$ ffmpeg -version
  • install package
$ pip install -e .

Preprocess / Handling Meta

  1. Download data files
  • In the LibriTTS case, checkout READMD
  1. Run commands (If you want to change sound settings, Change settings.py)
$ python pytorch_sound/scripts/preprocess.py [libri_tts / vctk / voice_bank] in_dir out_dir
  1. Checkout preprocessed data, meta files.
  • Maestro dataset is not required running preprocess code at now.

Examples

Environment

  • Python > 3.6
  • pytorch 1.0
  • ubuntu 16.04

Components

  1. Data and its meta file
  2. Data Preprocess
  3. General functions and modules in sound tasks
  4. Abstract training process

To be updated soon

  • Preprocess docs in README.md
  • Add test codes and CI
  • Document website.

LICENSE

  • This repository is under BSD-2 clause license. Check out the LICENSE file.