ming024/FastSpeech2

Should we rely on tensorboard's output for duraion, pitch and energy?

aidosRepoint opened this issue · 0 comments

Hi!

in model.module line 128, we have

        if duration_target is not None:
            x, mel_len = self.length_regulator(x, duration_target, max_len)
            duration_rounded = duration_target
        else:
            duration_rounded = torch.clamp(
                (torch.round(torch.exp(log_duration_prediction) - 1) * d_control),
                min=0,
            )
            x, mel_len = self.length_regulator(x, duration_rounded, max_len)
            mel_mask = get_mask_from_lengths(mel_len, self.device)

This means, that in train mode, there will always be duration_target. Does it mean that the output from VarianceAdaptor's forward method will always return the true value for durations? Does it mean that the loss calculation is wrong?