Adjusting batch_size + dataset in base_test.yaml yields noise
Ericxgao opened this issue · 1 comments
Ericxgao commented
I'm trying to train some models off of some music using the trainer repo, with the following yaml config:
# @package _global_
# Test with length 65536, batch size 4, logger sampling_steps [3]
sampling_rate: 48000
length: 65536
channels: 2
log_every_n_steps: 2000
model:
_target_: main.module_base.Model
lr: 1e-4
lr_beta1: 0.95
lr_beta2: 0.999
lr_eps: 1e-6
lr_weight_decay: 1e-3
ema_beta: 0.9999
ema_power: 0.7
model:
_target_: audio_diffusion_pytorch.AudioDiffusionModel
in_channels: ${channels}
channels: 128
patch_factor: 16
patch_blocks: 1
resnet_groups: 8
kernel_multiplier_downsample: 2
multipliers: [1, 2, 4, 4, 4, 4, 4]
factors: [4, 4, 4, 2, 2, 2]
num_blocks: [2, 2, 2, 2, 2, 2]
attentions: [0, 0, 0, 1, 1, 1, 1]
attention_heads: 8
attention_features: 64
attention_multiplier: 2
use_nearest_upsample: False
use_skip_scale: True
use_magnitude_channels: True
diffusion_sigma_distribution:
_target_: audio_diffusion_pytorch.UniformDistribution
datamodule:
_target_: main.module_base.Datamodule
dataset:
_target_: audio_data_pytorch.YoutubeDataset
urls:
- https://www.youtube.com/watch?v=FrMugs5eits
- https://www.youtube.com/watch?v=orrwpGhLjJo
- https://www.youtube.com/watch?v=OYmkDEdO5Ek
- https://www.youtube.com/watch?v=PgDUaKjIGLQ
- https://www.youtube.com/watch?v=zXncanvMfhg
- https://www.youtube.com/watch?v=W0EYGtK-DwE
- https://www.youtube.com/watch?v=ImgwN3u7Af0
- https://www.youtube.com/watch?v=ohsLkUlCu3I
- https://www.youtube.com/watch?v=vuV5DuVqDcw
- https://www.youtube.com/watch?v=kxi_vU-yJLg
- https://www.youtube.com/watch?v=-JPMd_NiY10
- https://www.youtube.com/watch?v=pHzf2FkNCIQ
- https://www.youtube.com/watch?v=mwpxQLeVKuo
- https://www.youtube.com/watch?v=WYbc32bQozo
- https://www.youtube.com/watch?v=LEGRJpOo7Ts
- https://www.youtube.com/watch?v=IiURF2gxUnc
- https://www.youtube.com/watch?v=43ZYv36QnVw
root: ${data_dir}
crop_length: 12 # seconds crops
transforms:
_target_: audio_data_pytorch.AllTransform
source_rate: ${sampling_rate}
target_rate: ${sampling_rate}
random_crop_size: ${length}
loudness: -20
val_split: 0.01
batch_size: 300
num_workers: 8
pin_memory: True
callbacks:
rich_progress_bar:
_target_: pytorch_lightning.callbacks.RichProgressBar
model_checkpoint:
_target_: pytorch_lightning.callbacks.ModelCheckpoint
monitor: "valid_loss" # name of the logged metric which determines when model is improving
save_top_k: 1 # save k best models (determined by above metric)
save_last: True # additionaly always save model from last epoch
mode: "min" # can be "max" or "min"
verbose: False
dirpath: ${logs_dir}/ckpts/${now:%Y-%m-%d-%H-%M-%S}
filename: '{epoch:02d}-{valid_loss:.3f}'
model_summary:
_target_: pytorch_lightning.callbacks.RichModelSummary
max_depth: 2
audio_samples_logger:
_target_: main.module_base.SampleLogger
num_items: 4
channels: ${channels}
sampling_rate: ${sampling_rate}
length: ${length}
sampling_steps: [3]
use_ema_model: True
diffusion_sampler:
_target_: audio_diffusion_pytorch.VSampler
diffusion_schedule:
_target_: audio_diffusion_pytorch.LinearSchedule
loggers:
wandb:
_target_: pytorch_lightning.loggers.wandb.WandbLogger
project: ${oc.env:WANDB_PROJECT}
entity: ${oc.env:WANDB_ENTITY}
# offline: False # set True to store all logs only locally
job_type: "train"
group: ""
save_dir: ${logs_dir}
trainer:
_target_: pytorch_lightning.Trainer
gpus: 0 # Set `1` to train on GPU, `0` to train on CPU only, and `-1` to train on all GPUs, default `0`
precision: 32 # Precision used for tensors, default `32`
accelerator: null # `ddp` GPUs train individually and sync gradients, default `None`
min_epochs: 0
max_epochs: -1
enable_model_summary: False
log_every_n_steps: 1 # Logs metrics every N batches
check_val_every_n_epoch: null
val_check_interval: ${log_every_n_steps}
This is a modification of the base_test.yaml config file, so I don't think there should be anything super off? I trained for about 4400 epochs over 10 hours.
My inference script is as follows:
# @title Download Model
import torch
from main.module_base import Model
from audio_diffusion_pytorch import AudioDiffusionModel, UniformDistribution
adm = AudioDiffusionModel(
in_channels=2,
channels=128,
patch_factor=16,
patch_blocks=1,
resnet_groups=8,
kernel_multiplier_downsample=2,
multipliers=[1, 2, 4, 4, 4, 4, 4],
factors=[4, 4, 4, 2, 2, 2],
num_blocks=[2, 2, 2, 2, 2, 2],
attentions=[0, 0, 0, 1, 1, 1, 1],
attention_heads=8,
attention_features=64,
attention_multiplier=2,
use_nearest_upsample=False,
use_skip_scale=True,
use_magnitude_channels=True,
diffusion_sigma_distribution=UniformDistribution
)
adm = adm.to('cuda')
model = Model.load_from_checkpoint(
checkpoint_path='/home/fsuser/audio-diffusion-pytorch-trainer/logs/ckpts/2022-10-20-08-43-18/last.ckpt',
lr=1e-4,
lr_beta1=0.95,
lr_beta2=0.999,
lr_eps=1e-6,
lr_weight_decay=1e-3,
ema_beta=0.9999,
ema_power=0.7,
model=adm
)
from audio_diffusion_pytorch import KarrasSchedule, VSampler
import torchaudio
import math
sampling_rate = 48000
# @markdown Generation length in seconds (will be rounded to be a power of 2 of sample_rate*length)
length = 10 #@param {type: "slider", min: 1, max: 87, step: 1}
length_samples = math.ceil(math.log2(length * sampling_rate))
# @markdown Number of samples to generate
num_samples = 5 #@param {type: "slider", min: 1, max: 16, step: 1}
# @markdown Number of diffusion steps (higher tends to be better but takes longer to generate)
num_steps = 100 #@param {type: "slider", min: 1, max: 200, step: 1}
with torch.no_grad():
samples = adm.sample(
noise=torch.randn((num_samples, 2, 2 ** length_samples), device='cuda'),
num_steps=num_steps,
sigma_schedule=KarrasSchedule(
sigma_min=1e-4,
sigma_max=10.0,
rho=7.0
),
sampler=VSampler(),
)
# Log audio samples
for i, sample in enumerate(samples):
cpu_sample = sample.cpu()
torchaudio.save(f'./audio_sample_{i}.wav', cpu_sample, sampling_rate)
All I get out is this strange buzz: https://soundcloud.com/itsoksami/audio-sample-0/s-QuWAjmeS7OK?si=9a7dbf264ad74915aa872c4043d09196&utm_source=clipboard&utm_medium=text&utm_campaign=social_sharing
Is there anything I'm doing blatantly wrong here?
Originally posted by @Ericxgao in archinetai/audio-diffusion-pytorch#29
Ericxgao commented
I'm using the wrong sigma_schedule - LinearDiffusion() works