as-ideas/TransformerTTS

inference error

sciai-ai opened this issue · 1 comments

Hi

I am getting this error while doing inference for a model I trained

ValueError: Layer #1 (named "Encoder" in the current model) was found to correspond to layer Encoder in the save file. However the new layer Encoder expects 109 weights, but the saved weights have 99 elements.
`

Config

`encoder_model_dimension: 384
decoder_model_dimension: 384
dropout_rate: 0.1
decoder_num_heads: [2, 2, 2, 2, 2, 2] # the length of this defines the number of layers
encoder_num_heads: [2, 2, 2, 2, 2, 2] # the length of this defines the number of layers
encoder_max_position_encoding: 2000
decoder_max_position_encoding: 10000
encoder_dense_blocks: 0
decoder_dense_blocks: 0
duration_conv_filters: [256, 226]
pitch_conv_filters: [256, 226]
duration_kernel_size: 3
pitch_kernel_size: 3
predictors_dropout: 0.1
mel_channels: 80
phoneme_language: en-us
with_stress: true
model_breathing: true
transposed_attn_convs: true
encoder_attention_conv_filters: [1536, 384]
decoder_attention_conv_filters: [1536, 384]
encoder_attention_conv_kernel: 3
decoder_attention_conv_kernel: 3
encoder_feed_forward_dimension:
decoder_feed_forward_dimension:
debug: false
wav_directory: /home/wavs
metadata_path: /home/metadata.txt
log_directory: /homelogs/
train_data_directory: transformer_tts_data
data_name: nsc
audio_settings_name: MelGAN_default
text_settings_name: Stress_Breathing
aligner_settings_name: alinger_extralayer_layernorm
tts_settings_name: tts_swap_conv_dims
n_test: 100
mel_start_value: .5
mel_end_value: -.5
max_mel_len: 1_200
min_mel_len: 80
bucket_boundaries: [200, 300, 400, 500, 600, 700, 800, 900, 1000, 1200] # mel bucketing
bucket_batch_sizes: [64, 42, 32, 25, 21, 18, 16, 14, 12, 6, 1]
val_bucket_batch_size: [6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 1]
sampling_rate: 22050
n_fft: 1024
hop_length: 256
win_length: 1024
f_min: 0
f_max: 8000
normalizer: MelGAN
trim_silence_top_db: 60
trim_silence: false
trim_long_silences: true
vad_window_length: 30
vad_moving_average_width: 8
vad_max_silence_length: 12
vad_sample_rate: 16000
norm_wav: true
target_dBFS: -30
int16_max: 32767
learning_rate_schedule:

  • [0, 1.0e-4]
    max_steps: 100_000
    validation_frequency: 5_000
    prediction_frequency: 5_000
    weights_save_frequency: 5_000
    weights_save_starting_step: 5_000
    train_images_plotting_frequency: 1_000
    keep_n_weights: 5
    keep_checkpoint_every_n_hours: 12
    n_steps_avg_losses: [100, 500, 1_000, 5_000] # command line display of average loss values for the last n steps
    prediction_start_step: 4_000
    text_prediction:
  • test_sentences.txt
    git_hash: '3638055'
    automatic: true
    alphabet: " !'(),-.:;?abcdefhijklmnopqrstuvwxyzæçðøħŋœǀǁǂǃɐɑɒɓɔɕɖɗɘəɚɛɜɞɟɠɡɢɣɤɥɦɧɨɪɫɬɭɮɯɰɱɲɳɴɵɶɸɹɺɻɽɾʀʁʂʃʄʈʉʊʋʌʍʎʏʐʑʒʔʕʘʙʛʜʝʟʡʢˈˌːˑ˞βθχᵻⱱ"
    step: 100000
    `

@cfrancesco your advice will be appreciated, thx!