/sonic-diffusion

Create AI generated Audio with Diffusion

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Sonic Diffusion header image

Sonic Diffusion is an AI music generation application that uses Melspectrograms with diffusion modeling to create unique and original compositions. The application is written in Python and is built on top of the PyTorch and NumPy libraries.

Note: This repository requires Weights and Biases for logging and tracking. To use Weights and Biases, you need to obtain an API key from www.wandb.ai.

Sample Audio and Mel spectrograms

Below is a link to sample audio files and their corresponding melspectrogram generated using Sonic Diffusion:

https://markstent.github.io/sonic-diffusion/

Mel spectrogram

Installation

To install Sonic Diffusion, first clone the repository:

git clone https://github.com/yourusername/sonic-diffusion.git

Then, navigate to the sonic-diffusion directory and install the required Python packages using pip:

'cd sonic-diffusion pip install -r requirements.txt

Usage

To train the model you will need a folder of Mel spectrograms (there are utility scripts at the bottom of this document).

To train a model using Sonic Diffusion, use the following example script:

python scripts/train_unet.py \
  --dataset_name path/to/dataset \
  --output_dir path/to/save/model \
  --num_epochs 100 \
  --train_batch_size 16 \
  --eval_batch_size 16 \
  --gradient_accumulation_steps 8 \
  --save_images_epochs 5 \
  --save_model_epochs 1 \
  --scheduler ddim \
  --model_resume_name name_of_model_in_wandb (if resuming training) \
  --run_id name_of_wandb_model (if resuming training)

The script takes several command-line arguments to configure the training process. You need to provide the path to the dataset, the output directory to save the trained model, and other parameters such as the number of epochs, batch size, and gradient accumulation steps. The script also supports resuming training from a previously saved model using the model_resume_name and run_id arguments.

There are a number of hyperparamers that can be set, these can be found in the training file for now.

Utility scripts:

Sonic Diffusion provides the following utility scripts:

Audio Evaluation

You can evaluate your audio results using the Stent Weighted Audio Similarity Score (SWASS), the repository can be found here.

To Do

  • Documentation
  • Add other forms of logging eg Tensorboard

Citations

Sonic Diffusion uses the following libraries and resources:

Acknowledgement

Thank you to Robert Dargavel Smith for all the help and inspiration for this project.

License

Sonic Diffusion is licensed under the GNU License. See the LICENSE file for more information.