(Dadabots fork) Don't use this repo yet - it currently runs off audiocraft fork with edits to lm and musicgen accepting torch generator
This repository extends the Audiocraft repository with a Trainer Class which trains the finetuned model unconditional. Input prompts are the same for all the data trained on. The model converges but old learned sounds will vanish.
- No input prompts
- Generating long sequences -> longer than 30 seconds
- Training from scratch (checkout the "full-training" branch, work in progress)
!pip install 'torch>=2.0'
!pip install -U audiocraft
!pip install wandb #optional
python run.py train MODELNAME PTFILE(optional)
normal training
python run.py tune MODELNAME PTFILE(optional)
reduce learning rate training
python run.py ensemble MODELNAME PTFILE(optional)
experimental cosine annealing lr scheduler training
python run.py generate PTFILE WAVFILE(optional)
generate from a trained model
python run.py explore PTFILE WAVFILE(optional)
generate a range of examples with hyperparameter grid search
python run.py blend PTDIR WAVFILE(optional)
blend multiple checkpoints trained in ensemble mode
For training you need two datasets for training and evaluation with sound files in .wav
, .mp3
, .flac
, .ogg
, .m4a
or copy this to notebook:
from train import main
dataset_cfg = dict(
dataset_path_train = "train",
dataset_path_eval = "eval",
batch_size=4,
num_examples_train= 1000,
num_examples_eval= 200,
segment_duration= 30,
sample_rate= 32_000,
shuffle= True,
return_info= False)
cfg = dict(
learning_rate = 0.0001,
epochs = 80,
model = "small",
seed = (hash("blabliblu") % 2**32 - 1),
use_wandb = True
)
main("Model Name", cfg, dataset_cfg, 'Wandb Project Name')
I made a second branch called "full-training" for training from scratch since the musicgen models are only trained on 1 channel. The model parameters are similar to the "small" musicGen model.
cfg = dict(
learning_rate = 0.000001,
epochs = 80,
model = None,
seed = (hash("blabliblu") % 2**32 - 1),
use_wandb = True,
sample_rate = 32000,
channels = 2,
cross_attention=False,
accum_iter = 48,
CosineAnnealing=False,
lr_steps = 200
)
Here i added Gradient accumulation and Cosine Annealing Scheduler with Warmup. Work in progress. The model is currently overfitting
The text prompt is replaced by audio previously generated. Therefore, the model can generate new samples that are coherent. display_audio
merges all samples generated.
run script:
python run.py generate MODEL_NAME
or copy this to notebook:
from generate_inf import generate_long_seq
from util import display_audio
out = generate_long_seq(model, 8, 1024, True, 1.0, 250, 0, None)
display_audio(out, path="audioSamples.wav")
@article{copet2023simple,
title={Simple and Controllable Music Generation},
author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
year={2023},
journal={arXiv preprint arXiv:2306.05284},
}
Thanks to Chavinlo who already made a MusicGen Trainer that helped me develop my code. Thanks to Dadabots for the inspiration.