Train as a Tacotron1 script problem
dazenhom opened this issue · 4 comments
Thanks for your great work, but I found that if I set the hyperparameter use_gst=False
and run, it seemed different from my understanding of Tacotron1. The tacotron.py code is part of here.
if reference_mel is not None:
# Reference encoder
refnet_outputs = reference_encoder(
reference_mel,
filters=hp.reference_filters,
kernel_size=(3,3),
strides=(2,2),
encoder_cell=GRUCell(hp.reference_depth),
is_training=is_training) # [N, 128]
self.refnet_outputs = refnet_outputs
if hp.use_gst:
# Style attention
style_attention = MultiheadAttention(
tf.expand_dims(refnet_outputs, axis=1), # [N, 1, 128]
tf.tanh(tf.tile(tf.expand_dims(gst_tokens, axis=0), [batch_size,1,1])), # [N, hp.num_gst, 256/hp.num_heads]
num_heads=hp.num_heads,
num_units=hp.style_att_dim,
attention_type=hp.style_att_type)
style_embeddings = style_attention.multi_head_attention() # [N, 1, 256]
else:
style_embeddings = tf.expand_dims(refnet_outputs, axis=1) # [N, 1, 128]
else:
print("Use random weight for GST.")
random_weights = tf.random_uniform([hp.num_heads, hp.num_gst], maxval=1.0, dtype=tf.float32)
random_weights = tf.nn.softmax(random_weights, name="random_weights")
style_embeddings = tf.matmul(random_weights, tf.nn.tanh(gst_tokens))
style_embeddings = tf.reshape(style_embeddings, [1, 1] + [hp.num_heads * gst_tokens.get_shape().as_list()[1]])
Original Tacotron1 code shoudn't train with the reference encoder part right?
However, your code pass the non-gst mode data into a reference_encoder
, which sounds strange ?
Maybe we can exchange the two IF
condition codes to make it correct.
if hp.use_gst:
***
if reference_mel is not None:
***
THANKS
@dazenhom Hi, thanks for your notes. In this repo, using use_gst=False
doesn't mean the tacotron1 model. Google also has another paper, which uses reference encoder to do style and multi-speaker synthesis. You can found it at https://arxiv.org/abs/1803.09047.
@syang1993 Thanks for your reply, I took a mistake with Tacotron1 from your work. I shall find another version of Tacotron1 to run my test. Thanks anyway.
I have try “use_gst=False”, but it seems to be the same as tacotron1? Although the refnet_outputs will change, but the generated audio will hardly change with different reference audio.