ncsoft/avocodo

Feature matching loss increases

Closed this issue · 0 comments

Hello, I'm training Avocodo Model with my own dataset consist of multiple datasets.

I touched some Generator's Parameter to change input and target sample rate. Generating 32kHz wave from 24kHz Mel. Hop size is 400.

When I train my avocodo model, Feature matching loss increases even Discriminator loss's descent stops.
As an aside, strangely enough, Mel Loss's descent, and the quality of the audio output is pretty good.

Is it normal while train vocoder? Will the feature matching loss`s acendent ever stop?

avocodo training

We'd love to hear about your experiences.

Thank you.

HYPER PARAMETERS
model:
  upsample_rates: '[[5], [5], [4], [4]]'
  upsample_kernel_sizes: '[[11], [11], [8], [8]]'
  upsample_initial_channel: 384
  resblock_kernel_sizes: '[3,7,11]'
  resblock_dilation_sizes: '[[1,3,5], [1,3,5], [1,3,5]]'
  projection_filters: '[0, 1, 1, 1]'
  projection_kernels: '[0, 5, 7, 11]'
  combd_h_u: '[[16, 64, 256, 1024, 1024, 1024], [16, 64, 256, 1024, 1024, 1024], [16,
    64, 256, 1024, 1024, 1024]]'
  combd_d_k: '[[7, 11, 11, 11, 11, 5], [11, 21, 21, 21, 21, 5], [15, 41, 41, 41, 41,
    5]]'
  combd_d_s: '[[1, 1, 4, 4, 4, 1], [1, 1, 4, 4, 4, 1], [1, 1, 4, 4, 4, 1]]'
  combd_d_d: '[[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]'
  combd_d_g: '[[1, 4, 16, 64, 256, 1], [1, 4, 16, 64, 256, 1], [1, 4, 16, 64, 256,
    1]]'
  combd_d_p: '[[3, 5, 5, 5, 5, 2], [5, 10, 10, 10, 10, 2], [7, 20, 20, 20, 20, 2]]'
  combd_op_f: '[1, 1, 1]'
  combd_op_k: '[3, 3, 3]'
  combd_op_g: '[1, 1, 1]'
  sbd_filters: '[[64, 128, 256, 256, 256],[64, 128, 256, 256, 256],[64, 128, 256,
    256, 256],[32, 64, 128, 128, 128]]'
  sbd_strides: '[[1, 1, 3, 3, 1], [1, 1, 3, 3, 1], [1, 1, 3, 3, 1], [1, 1, 3, 3, 1]]'
  sbd_kernel_sizes: '[        [[7, 7, 7],[7, 7, 7],[7, 7, 7],[7, 7, 7],[7, 7, 7]],        [[5,
    5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5]],        [[3, 3, 3],[3, 3, 3],[3,
    3, 3],[3, 3, 3],[3, 3, 3]],        [[5, 5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5],[5,
    5, 5]]    ]'
  sbd_dilations: '[        [[5, 7, 11], [5, 7, 11], [5, 7, 11], [5, 7, 11], [5, 7,
    11]],        [[3, 5, 7], [3, 5, 7], [3, 5, 7], [3, 5, 7], [3, 5, 7]],        [[1,
    2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]],        [[1, 2, 3], [1, 2,
    3], [1, 2, 3], [2, 3, 5], [2, 3, 5]]    ]'
  sbd_band_ranges: '[[0, 6], [0, 11], [0, 16], [0, 64]]'
  sbd_transpose: '[False, False, False, True]'
  model_pqmf_config: '{        ''sbd'': [16, 256, 0.03, 10.0],        ''fsbd'': [64,
    256, 0.1, 9.0]    }'
  segment_size: 32000
  pqmf_config: '{        ''lv1'': [4, 192, 0.25, 10.0],        ''lv2'': [16, 256,
    0.03, 10.0]    }'