How many steps should we train? 1 speaker 2 hours of data 691 wavs it is pretty fast
Opened this issue · 0 comments
FurkanGozukara commented
Here my config but i have no idea
it is making checkpoints pretty fast too
Environment name is set as "DLAS" as per environment.yaml
anaconda3/miniconda3 detected in C:\Users\King\miniconda3
Starting conda environment "DLAS" from C:\Users\King\miniconda3
Latest git hash: 5ab4d9e
Disabled distributed training.
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA SETUP: Loading binary C:\Users\King\miniconda3\envs\DLAS\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
23-05-01 03:03:10.550 - INFO: name: test1
model: extensibletrainer
scale: 1
gpu_ids: [0]
start_step: -1
checkpointing_enabled: True
fp16: False
use_8bit: True
wandb: False
use_tb_logger: True
datasets:[
train:[
name: test1
n_workers: 8
batch_size: 138
mode: paired_voice_audio
path: F:/ozen-toolkit/output/test\train.txt
fetcher_mode: ['lj']
phase: train
max_wav_length: 255995
max_text_length: 200
sample_rate: 22050
load_conditioning: True
num_conditioning_candidates: 2
conditioning_length: 44000
use_bpe_tokenizer: True
load_aligned_codes: False
data_type: img
]
val:[
name: test1
n_workers: 1
batch_size: 139
mode: paired_voice_audio
path: F:/ozen-toolkit/output/test\valid.txt
fetcher_mode: ['lj']
phase: val
max_wav_length: 255995
max_text_length: 200
sample_rate: 22050
load_conditioning: True
num_conditioning_candidates: 2
conditioning_length: 44000
use_bpe_tokenizer: True
load_aligned_codes: False
data_type: img
]
]
steps:[
gpt_train:[
training: gpt
loss_log_buffer: 500
optimizer: adamw
optimizer_params:[
lr: 1e-05
triton: False
weight_decay: 0.01
beta1: 0.9
beta2: 0.96
]
clip_grad_eps: 4
injectors:[
paired_to_mel:[
type: torch_mel_spectrogram
mel_norm_file: ../experiments/clips_mel_norms.pth
in: wav
out: paired_mel
]
paired_cond_to_mel:[
type: for_each
subtype: torch_mel_spectrogram
mel_norm_file: ../experiments/clips_mel_norms.pth
in: conditioning
out: paired_conditioning_mel
]
to_codes:[
type: discrete_token
in: paired_mel
out: paired_mel_codes
dvae_config: ../experiments/train_diffusion_vocoder_22k_level.yml
]
paired_fwd_text:[
type: generator
generator: gpt
in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
out: ['loss_text_ce', 'loss_mel_ce', 'logits']
]
]
losses:[
text_ce:[
type: direct
weight: 0.01
key: loss_text_ce
]
mel_ce:[
type: direct
weight: 1
key: loss_mel_ce
]
]
]
]
networks:[
gpt:[
type: generator
which_model_G: unified_voice2
kwargs:[
layers: 30
model_dim: 1024
heads: 16
max_text_tokens: 402
max_mel_tokens: 604
max_conditioning_inputs: 2
mel_length_compression: 1024
number_text_tokens: 256
number_mel_codes: 8194
start_mel_token: 8192
stop_mel_token: 8193
start_text_token: 255
train_solo_embeddings: False
use_mel_codes_as_input: True
checkpointing: True
]
]
]
path:[
pretrain_model_gpt: ../experiments/autoregressive.pth
strict_load: True
root: F:\DL-Art-School
experiments_root: F:\DL-Art-School\experiments\test1
models: F:\DL-Art-School\experiments\test1\models
training_state: F:\DL-Art-School\experiments\test1\training_state
log: F:\DL-Art-School\experiments\test1
val_images: F:\DL-Art-School\experiments\test1\val_images
]
train:[
niter: 50000
warmup_iter: -1
mega_batch_factor: 4
val_freq: 500
default_lr_scheme: MultiStepLR
gen_lr_steps: [400, 800, 1120, 1440]
lr_gamma: 0.5
ema_enabled: False
manual_seed: 1337
]
eval:[
output_state: gen
injectors:[
gen_inj_eval:[
type: generator
generator: generator
in: hq
out: ['gen', 'codebook_commitment_loss']
]
]
]
logger:[
print_freq: 100
save_checkpoint_freq: 500
visuals: ['gen', 'mel']
visual_debug_rate: 500
is_mel_spectrogram: True
disable_state_saving: False
]
upgrades:[
number_of_checkpoints_to_save: 0
number_of_states_to_save: 0
]
is_train: True
dist: False
23-05-01 03:03:10.734 - INFO: Random seed: 1337
23-05-01 03:03:16.202 - INFO: Number of training data elements: 552, iters: 4
23-05-01 03:03:16.203 - INFO: Total epochs needed: 12500 for iters 50,000
23-05-01 03:03:16.206 - INFO: Number of val images in [test1]: 139
C:\Users\King\miniconda3\envs\DLAS\lib\site-packages\transformers\configuration_utils.py:379: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
warnings.warn(
Loading from ../experiments/dvae.pth
23-05-01 03:03:24.183 - INFO: Network gpt structure: DataParallel, with parameters: 421,526,786
23-05-01 03:03:24.183 - INFO: UnifiedVoice(
(conditioning_encoder): ConditioningEncoder(
(init): Conv1d(80, 1024, kernel_size=(1,), stride=(1,))
(attn): Sequential(
(0): AttentionBlock(
(norm): GroupNorm32(32, 1024, eps=1e-05, affine=True)
(qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,))
(attention): QKVAttentionLegacy()
(x_proj): Identity()
(proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,))
)
(1): AttentionBlock(
(norm): GroupNorm32(32, 1024, eps=1e-05, affine=True)
(qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,))
(attention): QKVAttentionLegacy()
(x_proj): Identity()
(proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,))
)
(2): AttentionBlock(
(norm): GroupNorm32(32, 1024, eps=1e-05, affine=True)
(qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,))
(attention): QKVAttentionLegacy()
(x_proj): Identity()
(proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,))
)
(3): AttentionBlock(
(norm): GroupNorm32(32, 1024, eps=1e-05, affine=True)
(qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,))
(attention): QKVAttentionLegacy()
(x_proj): Identity()
(proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,))
)
(4): AttentionBlock(
(norm): GroupNorm32(32, 1024, eps=1e-05, affine=True)
(qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,))
(attention): QKVAttentionLegacy()
(x_proj): Identity()
(proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,))
)
(5): AttentionBlock(
(norm): GroupNorm32(32, 1024, eps=1e-05, affine=True)
(qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,))
(attention): QKVAttentionLegacy()
(x_proj): Identity()
(proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,))
)
)
)
(text_embedding): Embedding(256, 1024)
(mel_embedding): Embedding(8194, 1024)
(gpt): GPT2Model(
(drop): Dropout(p=0.1, inplace=False)
(h): ModuleList(
(0): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(4): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(5): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(6): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(7): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(8): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(9): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(10): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(11): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(12): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(13): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(14): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(15): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(16): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(17): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(18): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(19): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(20): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(21): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(22): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(23): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(24): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(25): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(26): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(27): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(28): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(29): GPT2Block(
(ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(mel_pos_embedding): LearnedPositionEmbeddings(
(emb): Embedding(608, 1024)
)
(text_pos_embedding): LearnedPositionEmbeddings(
(emb): Embedding(404, 1024)
)
(final_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(text_head): Linear(in_features=1024, out_features=256, bias=True)
(mel_head): Linear(in_features=1024, out_features=8194, bias=True)
)
23-05-01 03:03:24.188 - INFO: Loading model for [../experiments/autoregressive.pth]
23-05-01 03:03:25.660 - INFO: Start training from epoch: 0, iter: -1
0%| | 0/4 [00:00<?, ?it/s]C:\Users\King\miniconda3\envs\DLAS\lib\site-packages\torch\optim\lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
23-05-01 03:04:11.737 - INFO: [epoch: 0, iter: 0, lr:(1.000e-05,1.000e-05,)] step: 0.0000e+00 samples: 1.3800e+02 megasamples: 1.3800e-04 iteration_rate: 8.9322e-02 loss_text_ce: 3.8859e+00 loss_mel_ce: 3.5641e+00 loss_gpt_total: 3.6030e+00 grad_scaler_scale: 1.0000e+00 learning_rate_gpt_0: 1.0000e-05 learning_rate_gpt_1: 1.0000e-05 total_samples_loaded: 1.3800e+02 percent_skipped_samples: 0.0000e+00 percent_conditioning_is_self: 1.0000e+00 gpt_conditioning_encoder: 7.4988e+00 gpt_gpt: 4.9755e+00 gpt_heads: 3.7014e+00
23-05-01 03:04:11.737 - INFO: Saving models and training states.
100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:18<00:00, 19.70s/it]
100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:15<00:00, 18.81s/it]
100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:13<00:00, 18.49s/it]