/pyneuralfx

Primary LanguageJupyter NotebookMIT LicenseMIT

PyNeuralFx

A toolkit for neural audio effect modeling. Users can conduct experiments conveniently with this toolkit for audio effect modeling.

PyNeuralFx paper

[Powered by PyNeuralFx]

Installation

You can install PyNeuralFx via

$ pip install pyneuralfx

By using the frame work, you cam

$ git clone https://github.com/ytsrt66589/pyneuralfx.git

then

$ cd frame_work/ 

Motivation

Due to the rising importance of the audio effect related task, several easy-to-use toolkits are developed. For example, dasp-package (repo_link) for differetntiable signal processing in pytorch and grafx (ref_link) for audio effect processing graph in pytorch. However, there is no easy-to-use toolkit for neural audio effect modeling task, especially for the black-box method. PyNeuralFx aims to overcome this issue, helping beginners easily start the neural audio effect modeling research and inspiring experienced researchers with different aspects.

Tutorials

You can find out the tutorial in /tutorials.

Functionality

Neural Network Models

PyNeuralFx follows the naming principles: [control-model] For example, if we use the concat as the conditioning method and the gru as the model, then the model called: Concat-GRU. If the model is only for snapshot modeling, then the control will always be snapshot

PyNeuralFx supports:

snaoshot modeling

  • snapshot-tcn
  • snapshot-gcn
  • snapshot-vanilla-rnn
  • snapshot-lstm
  • snapshot-gru
  • snapshot-tfilm-gcn (ref_link)
  • snapshot-tfilm-tcn (ref_link)

full modeling

  • concat-gru
  • film-gru
  • statichyper-gru
  • dynamichyper-gru
  • concat-lstm
  • film-lstm
  • statichyper-lstm
  • dynamichyper-lstm
  • film-vanilla-rnn
  • statichyper-vanilla-rnn
  • concat-gcn
  • film-gcn
  • hyper-gcn
  • concat-tcn
  • film-tcn
  • hyper-tcn
  • film-ssm (ref_link)

Loss functions

In our opinion, loss functions often aim for different purposes. Some are for the reconstuction loss (Overall reconstruction), some are for eliminateing specific problems (Improving sound details), and some are for leveraging perceptual properties (Aligning human perceptual). More research are needed for the exploration of the loss of different audio effects.

PyNeuralFx supports:

  • esr loss
  • l1 loss
  • l2 loss
  • complex STFT loss
  • multi-resolution complex STFT loss
  • STFT loss
  • multo-resolution STFT loss
  • dc eliminating loss
  • shot-time energy-loss (ref_link)
  • adversarial loss (ref_link, ref_link)

Also, PyNeuralFx supports

Evaluation metrics

The loss functions used above can be used as the evaluation metric also, for estimation the reconstruction error. Moreover, PyNeuralFx also supports other metrics for comprehensive evaluation:

Notice Due to the original implementation of the transient extraction is slow, we implement another implementation of the transient extraction. Users can experiment with those two methods and compare the difference.

Visualization

PyNeuralFx supports two types of visualization:

  • Wave file comparison
    • time-domain wave comparison
    • spectrum difference
  • Model's behavior visualization
    • harmonic response
    • distortion curve
    • sine sweep visualization (for observing the aliasing problem)
    • phase response

Training Frame Work Usage Flow

image info

(First run the command cd frame_work. Ensure that the working directory is frame_work)

  1. Download dataset: Download dataset from the commonly used academic paper or prepare the dataset by yourself. Then put the data under the folder data. Current supported dataset is listed below sections, for the supported dataset, we provide the preprocess file to match the data template we expected.
  2. Preprocess data: Write your own code or manually to match the data template we expected for using the frame work provided in pyneuralfx. Please refer to dataset.md section for more details. If you use the dataset pyneuralfx supported then the preprocess file is already provided in preprocess/{name_of_the_dataset}.py.
  3. Prepare Configuration: modify the configuration files in configs/. All experiments are record by configuration file to ensure the reproducibility. Further detail of configuration setting is shown in configuration.md.
  4. Training: run the code to train the model depends on the configuration files. Please refer to train.md for more details.
  5. Evaluation & Visualization: evaluate your results by several metrics or visualize the comparison or important system properties. Please refer to evalvis.md for more details.

Tricks

  1. During training, you can use loss_analysis/compare_loss.py to check to validation loss curve. (Remember to modify the experiment root in compare_loss.py)

Supported Dataset

Those datasets are collected from previous works, if you use them in your paper or in your project, please cite the corresponding paper.

To-do

  • Support auraloss function
  • Support DDSP-based neural audio effect modeling architecture
  • Support more datasets as default
  • Support analysis of computation cost
  • Adding tutorial about full-modeling scenario
  • Adding tutorial about effect-chain modeling
  • Support more pre-trained models on several audio effects

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

pyneuralfx was created by yytung. It is licensed under the terms of the MIT license.

Credits

pyneuralfx was created with cookiecutter and the py-pkgs-cookiecutter template.

This project is highly inspired by the following repositories, thanks to the amazaing works they have done. If you are interested in the audio effect related works, please look at the following repositories or websites to gain more insights.

  • micro-tcn (link)
  • gcn-tfilm ((link))
  • pyloudnorm (link)
  • ddsp-singing-vocoder (link)
  • Binaural Speech Synthesis (link)
  • GreyBoxDRC (link)
  • sms-tools (link)
  • DeepAFx-ST (link)
  • Audio DSPy (link)
  • Jatin Chowdhurry medium (link)
  • Hyper LSTM (link)
  • GuitarML (link)
  • SFI source separation (link)